eZ Publish DFS - high performance considerations when serving image aliases

This blog post is about how eZ Publish serves content images -- also called image aliases -- in a DFS cluster setup.  The eZ DFS setup is the recommended and most common solution for high traffic eZ Publish sites that are run off multiple servers.  However, it still has some room for improvement when it comes to serving images. In this post, we describe how eZ Publish currently serves image aliases, along with its downsides and some alternative approaches. At the end, we describe a possible implementation with some example performance gains.

How image alias serving currently works

A request for an image alias is first routed by an Apache rewrite rule:

RewriteRule ^/var/([^/]+/)?storage/images/.* /index_cluster.php [L]
 

index_cluster.php (and some other PHP scripts) first checks if the requested URI exists in the cluster database and if the image is still available (not removed or expired). (For more information about the eZ DFS cluster database, please see the documentation.)

  • If a valid image alias is found in the cluster DB, the PHP script opens the image file on the shared file system (for example an NFS) and serves the file content (directly from the NFS).
  • If the image alias was not found, the script serves a 404 response.
  • If the image alias was removed, the script serves a 404 response.

Downsides

  • All images are served by reading the image files from the shared file system. In our tests, the file access to a shared file system can be 10 to 50 times slower than accessing the file from a local file system. The solution does not scale well because more than 3 to 4 web servers can probably saturate the performance of a shared file system; additional web servers will not add any additional performance when serving image aliases.
  • Each image request starts PHP and opens a MySQL connection. Therefore, if you have a landing page with 50 article thumbnails, a single page request results in 50 DB connections. On high traffic sites, an eZ Publish setup can quickly run out of open MySQL connections. Another limitation is the number of available open TCP ports: during stress tests, we recognized that moderate ApacheBench (ab) tests will use all available TCP ports and then start to show failed requests.
  • We mention that it's a problem to start PHP and a connection to the DB, but it might be required in some cases because you want to check whether visitors are allowed to access an image. As mentioned before, the current code base is smart and does not allow access to images from removed object versions. However, it's not preventing access to hidden objects or objects that a visitor is not allowed to see. Usually, this is not a major issue unless you have sensitive information contained in images, but it is something to consider.

Solutions

  1. A reverse proxy such as Akamai or Varnish helps you to scale image alias requests in a significant way. As a side note, you can specify long caching times for image aliases because each object update create a unique image alias URL.  However, you still have a problem if you need to account for user permissions or removed object versions.
  2. Let each web server create a local cache of image aliases. This way, the image requests will scale and perform a lot better on multiple web servers. This makes it harder to clear the image alias cache, though: the script "bin/php/ezcache.php" will not remove the local cache of image aliases. You would need to manually rename the folder "var/<site>/storage/images" (as a rename/move is faster than deleting a folder).
  3. Implement visibility and permission checks as an optional feature that can be enabled or disabled.

A real implementation

We've implemented the solution described below for one of our clients, following solution #2. For this particular client, we are already using a reverse proxy (solution #1) and we are not interested in an implementation for solution 3: the client is not worried about the visibility and permission checks. They are very much interested in improving the system performance.

First, an Apache rewrite rule would check for the existence of the local image file. (Note that we also list a second rewrite rule for serving images that do exist, since the very last eZ Publish rewrite rule sends all remaining requests to "index.php".)

# Check for a local image file
# If it doesn't exist, use a PHP script that serves the image from the shared file system and also places a copy locally.
 
RewriteCond %{REQUEST_URI} ^/var/([^/]+/)?storage/images/.*
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/var/([^/]+/)?storage/images/.* /index_cluster.php [L]
 
# Serve local image file if it exists
RewriteRule ^/var/([^/]+/)?storage/images(-versioned)?/.* - [L]
 

In "index_cluster.php" we configure a new "storage backend":

define( 'STORAGE_BACKEND',         'localcache' );

This allows us to put our code in a dedicated file "index_image_localcache.php". Here is the code showing the general idea. It does not contain the code for some helper functions.

<?php
$source = MOUNT_POINT_PATH . $_SERVER[ 'REQUEST_URI' ];
 
if( file_exists( $source ) )
{
        $target = dirname(__FILE__) . $_SERVER[ 'REQUEST_URI' ];
        $target_temp = $target . '.' . getmypid()  . '_' . rand();
 
        create_local_directories( $target );
 
        if( copy( $source, $target_temp ) )
        {
                if( ! file_exists( $target) )
                {
                        rename( $target_temp, $target );
                }
                //another process was faster
                else
                {
                        unlink( $target_temp );
                }
        }
        else
        {
                serve_500( 'Cannot create local copy. Check permissions.' );
        }
 
        serve_image( $target );
}
else
{
        serve_404( $source );
}
 
?> 
 

Performance gain

Serving image aliases with the original code of the DFS handler (index_image_dfsmysqli.php) is already quite fast. Here is the ApacheBench test result from our example client:

ab -n 10000 -c 150 http://192.168.100.181/var/ezwebin_site/storage/images/media/images/0927-image-1/10760921-1-eng-US/0927-image-1_thumbnail_65_cropped.jpg
 
Failed requests:        0
Requests per second:    5316.28 [#/sec] (mean)

Now, look at the same test with our handler (index_image_localcache.php):

ab -n 10000 -c 150 http://192.168.100.181/var/ezwebin_site/storage/images/media/images/0927-image-1/10760921-1-eng-US/0927-image-1_thumbnail_65_cropped.jpg
 
Failed requests:        0
Requests per second:    14182.20 [#/sec] (mean)

Performance is not the only reason to use our handler. Each image alias request opens a TCP port to the database and another to the NFS server. The ApacheBench tests show that you can easily run out of available TCP connections. Those tests show "Failed requests" with the existing handler only:

ab -n 30000 -c 10 http://192.168.100.181/var/ezwebin_site/storage/images/media/images/0927-image-1/10760921-1-eng-US/0927-image-1_thumbnail_65_cropped.jpg
 
Failed requests:     <strong>2135</strong>
Requests per second:    4294.20 [#/sec] (mean)

It is not very realistic to run out of TCP connections under normal/real traffic situations. (As a side note, you can check the current TCP connections with "netstat -n".) Still, avoiding extra MySQL and NFS connections will speed up the overall performance of an eZ Publish setup, as our JMeter tests proved. The test that we used in JMeter simulates visitors browsing landing pages and article pages with many image thumbnails on each page. The overall performance gain (using index_image_localcache.php) was about 10%. In real traffic situations (without a reverse proxy), that performance gain is probably higher because the MySQL and NFS caching is not that effective (compared to the repetitive traffic pattern from JMeter). When you are behind a reverse proxy and you have long caching times configured for image aliases, the overall performance gain is smaller but still tangible.

Comments

blog comments powered by Disqus

Contact

604-637-6396
hiATmugo.ca

#414-207 W. Hastings St
Vancouver, BC
Canada


RSS icon Twitter icon

We have been using this team for editing, consulting, support and training services in North America for the last 3 years and we still do. The team has shown their expert knowledge of our systems, and is responsive and dedicated. The services they provide include expert consulting on eZ Publish towards large, high-traffic enterprise customers.

CEO Aleksander Farstad , eZ Systems

I relied on Mugo for the development of two complex web applications, and I was extremely impressed with their accurate estimates, consistent and timely delivery of production code, and commitment. They have the rare blend of business knowledge and deep technical expertise, and they excel at listening to business requirements and translating those requirements into intelligent and reliable software products. They are able to work on all aspects of software solutions -- front end/UI, core code, modules and extensions, application integration, and infrastructure -- and they are extremely capable in all of those areas. I highly recommend Mugo to anyone in need of a competent and dependable technology partner.

Co-Founder Graham Tillotson, Tandem, Chicago

Mugo maintains and develops new features for the Rasmussen Reports website, www.rasmussenreports.com. We are impressed with how efficiently they deliver high quality solutions that exactly fit our needs. The Mugo team understands the complexity of our high traffic website and was able to improve our site's overall performance, as well as its search engine rating. For our site's subscribers, Mugo quickly and efficiently developed many new features for our Premium Section.

The Mugo team is great to work with, and I look forward to a long and productive relationship.

Stephen W Smith, Interim CEO, Rasmussen Reports

If you ask me what is the first thing you think of about the Mugo team, I would say that they listen.

For a customer, the worst thing is to work with a partner that doesn't listen to you. That doesn't understand your points. That simply doesn't listen and try to help you.

Daniel Iribarren, ClearCap

Mugo Web is not another service provider. They are your partners.
One of the things I really like about the Mugo team is that they care about my project and myself. That is why I call them my partners.

Daniel Iribarren, ClearCap

Mugo Web is not just a group of geeks. They are a team of highly knowledgeable technology and business people.
They understand that they are helping you to design solutions to satisfy or solve business problems. And not every developer is able to deliver that. I have worked with more than 10 different let's says production centers and what they delivered to me was geek stuff. Not business solutions.
Mugo Web will help you to use geek stuff to solve business problems.

Daniel Iribarren, ClearCap

It's been a pleasure working with Mugo Web. They are fast, efficient and always helpful. They respond positively to questions and offer great solutions to technical problems.

Susan Wright, Director of Operations, Toronto Arts Council and Foundation

We've been extremely pleased with the work Mugo has done for us. They have been responsive and understanding in dealing with the complexities of our needs.

Wendy Prugh, Program Manager, The Christian Science Monitor

The team members we work with at Mugo have demonstrated a high caliber of competency in the work they do, and it instills great confidence in us to know that we have such quality professionals in our corner. Their consistent delivery has helped us develop a fluid system for enacting change and improvements on our website. We certainly look forward to continuing this relationship!

Wendy Prugh, Program Manager, The Christian Science Monitor

Thank you very much Mugo Web for the website work you've done for our company. You valued & respected our time. You produced quality work. You paid attention to detail and understood the JPLradio concept. Most importantly, we appreciate the good business ethic of our contact person your company provided. For that, I/we thank you.

Troy B. Williams, Founder & Program Director, Just Positive Lyrics Radio

We thoroughly enjoy our working relationship with Mugo Web. They are customer-focused, take the time to listen, give good solid advice with our best interests in mind, and have a quick turnaround time. We've had a number of compliments on the new website, which looks goods, functions well, and performs solidly.

Paul O'Sullivan, IT Specialist, New Society Publishers

From our first contact, the team at Mugo has shown an unprecedented level of commitment to our web projects... ending almost 4 years of struggling! In the three months Mugo worked on our new version -- which had stalled with our previous developers -- we have made more progress than ever before. They are fast, precise and thorough, and definitely stand out in the industry!

Sébastien Michel, Director, Frogs-in-NZ

Tweets

Follow Mugo on Twitter