Mugo Web main content.

eZ Publish DFS - high performance considerations when serving image aliases

By: Philipp Kamps | October 20, 2011 | eZ Publish development tips

This blog post is about how eZ Publish serves content images -- also called image aliases -- in a DFS cluster setup.  The eZ DFS setup is the recommended and most common solution for high traffic eZ Publish sites that are run off multiple servers.  However, it still has some room for improvement when it comes to serving images. In this post, we describe how eZ Publish currently serves image aliases, along with its downsides and some alternative approaches. At the end, we describe a possible implementation with some example performance gains.

How image alias serving currently works

A request for an image alias is first routed by an Apache rewrite rule:

RewriteRule ^/var/([^/]+/)?storage/images/.* /index_cluster.php [L]

index_cluster.php (and some other PHP scripts) first checks if the requested URI exists in the cluster database and if the image is still available (not removed or expired). (For more information about the eZ DFS cluster database, please see the documentation.)

  • If a valid image alias is found in the cluster DB, the PHP script opens the image file on the shared file system (for example an NFS) and serves the file content (directly from the NFS).
  • If the image alias was not found, the script serves a 404 response.
  • If the image alias was removed, the script serves a 404 response.

Downsides

  • All images are served by reading the image files from the shared file system. In our tests, the file access to a shared file system can be 10 to 50 times slower than accessing the file from a local file system. The solution does not scale well because more than 3 to 4 web servers can probably saturate the performance of a shared file system; additional web servers will not add any additional performance when serving image aliases.
  • Each image request starts PHP and opens a MySQL connection. Therefore, if you have a landing page with 50 article thumbnails, a single page request results in 50 DB connections. On high traffic sites, an eZ Publish setup can quickly run out of open MySQL connections. Another limitation is the number of available open TCP ports: during stress tests, we recognized that moderate ApacheBench (ab) tests will use all available TCP ports and then start to show failed requests.
  • We mention that it's a problem to start PHP and a connection to the DB, but it might be required in some cases because you want to check whether visitors are allowed to access an image. As mentioned before, the current code base is smart and does not allow access to images from removed object versions. However, it's not preventing access to hidden objects or objects that a visitor is not allowed to see. Usually, this is not a major issue unless you have sensitive information contained in images, but it is something to consider.

Solutions

  1. A reverse proxy such as Akamai or Varnish helps you to scale image alias requests in a significant way. As a side note, you can specify long caching times for image aliases because each object update create a unique image alias URL.  However, you still have a problem if you need to account for user permissions or removed object versions.
  2. Let each web server create a local cache of image aliases. This way, the image requests will scale and perform a lot better on multiple web servers. This makes it harder to clear the image alias cache, though: the script "bin/php/ezcache.php" will not remove the local cache of image aliases. You would need to manually rename the folder "var/<site>/storage/images" (as a rename/move is faster than deleting a folder).
  3. Implement visibility and permission checks as an optional feature that can be enabled or disabled.

A real implementation

We've implemented the solution described below for one of our clients, following solution #2. For this particular client, we are already using a reverse proxy (solution #1) and we are not interested in an implementation for solution 3: the client is not worried about the visibility and permission checks. They are very much interested in improving the system performance.

First, an Apache rewrite rule would check for the existence of the local image file. (Note that we also list a second rewrite rule for serving images that do exist, since the very last eZ Publish rewrite rule sends all remaining requests to "index.php".)

# Check for a local image file
# If it doesn't exist, use a PHP script that serves the image from the shared file system and also places a copy locally.

RewriteCond %{REQUEST_URI} ^/var/([^/]+/)?storage/images/.*
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^/var/([^/]+/)?storage/images/.* /index_cluster.php [L]

# Serve local image file if it exists
RewriteRule ^/var/([^/]+/)?storage/images(-versioned)?/.* - [L]

In "index_cluster.php" we configure a new "storage backend":

define( 'STORAGE_BACKEND',         'localcache' );

This allows us to put our code in a dedicated file "index_image_localcache.php". Here is the code showing the general idea. It does not contain the code for some helper functions.

<?php
$source = MOUNT_POINT_PATH . $_SERVER[ 'REQUEST_URI' ];

if( file_exists( $source ) )
{
        $target = dirname(__FILE__) . $_SERVER[ 'REQUEST_URI' ];
        $target_temp = $target . '.' . getmypid()  . '_' . rand();

        create_local_directories( $target );

        if( copy( $source, $target_temp ) )
        {
                if( ! file_exists( $target) )
                {
                        rename( $target_temp, $target );
                }
                //another process was faster
                else
                {
                        unlink( $target_temp );
                }
        }
        else
        {
                serve_500( 'Cannot create local copy. Check permissions.' );
        }

        serve_image( $target );
}
else
{
        serve_404( $source );
}

?> 

Performance gain

Serving image aliases with the original code of the DFS handler (index_image_dfsmysqli.php) is already quite fast. Here is the ApacheBench test result from our example client:

ab -n 10000 -c 150 http://192.168.100.181/var/ezwebin_site/storage/images/media/images/0927-image-1/10760921-1-eng-US/0927-image-1_thumbnail_65_cropped.jpg

Failed requests:        0
Requests per second:    5316.28 [#/sec] (mean)

Now, look at the same test with our handler (index_image_localcache.php):

ab -n 10000 -c 150 http://192.168.100.181/var/ezwebin_site/storage/images/media/images/0927-image-1/10760921-1-eng-US/0927-image-1_thumbnail_65_cropped.jpg

Failed requests:        0
Requests per second:    14182.20 [#/sec] (mean)

Performance is not the only reason to use our handler. Each image alias request opens a TCP port to the database and another to the NFS server. The ApacheBench tests show that you can easily run out of available TCP connections. Those tests show "Failed requests" with the existing handler only:

ab -n 30000 -c 10 http://192.168.100.181/var/ezwebin_site/storage/images/media/images/0927-image-1/10760921-1-eng-US/0927-image-1_thumbnail_65_cropped.jpg

Failed requests:     <strong>2135</strong>
Requests per second:    4294.20 [#/sec] (mean)

It is not very realistic to run out of TCP connections under normal/real traffic situations. (As a side note, you can check the current TCP connections with "netstat -n".) Still, avoiding extra MySQL and NFS connections will speed up the overall performance of an eZ Publish setup, as our JMeter tests proved. The test that we used in JMeter simulates visitors browsing landing pages and article pages with many image thumbnails on each page. The overall performance gain (using index_image_localcache.php) was about 10%. In real traffic situations (without a reverse proxy), that performance gain is probably higher because the MySQL and NFS caching is not that effective (compared to the repetitive traffic pattern from JMeter). When you are behind a reverse proxy and you have long caching times configured for image aliases, the overall performance gain is smaller but still tangible.