Using Varnish to speed up eZ Publish websites

By: Peter Keung | August 24, 2012 | eZ Publish add-ons

Varnish Cache is a powerful website caching system that dramatically increases your site's performance. It does so by sitting in front of your Apache, PHP, content management system, and database stack in order to cache your web pages and serve repeat requests. This saves the back-end stack from consuming unnecessary resources to generate the same pages over and over again. Mugo Web has implemented Varnish in front of eZ Publish for several client websites. Although every client has different needs, we've come up with an outline of tips and considerations that are common across most implementations.

When to use Varnish

Typically, Varnish is used to cache the entire HTML result of pages. As a result, it is most straightforward to implement it on sites that essentially serve the same pages to all site visitors. Sites that support user logins to the front-end require special care in order to allow certain types of requests or pages to bypass Varnish; regardless, Varnish can still be very effective on many such sites.

Unlike a content delivery network (CDN), Varnish does not help in getting your content on servers closer to each visitor, since it sits directly on your web server(s). However, you gain a lot more control over configuration details and precision over page caching and expiry by having Varnish on your server(s). Also, Varnish can still provide a meaningful performance boost if you're using a CDN, because a CDN might access each of your pages and supporting assets (such as images) 20+ times in order to populate all of its edge server caches.

On eZ Publish sites, Varnish can be extremely helpful in some less obvious ways, such as caching content images in a cluster setup, thus reducing the performance demands on the shared file system.

Varnish management

Installing Varnish is well-documented. Once installed, it can be started, restarted, and reloaded just like any other service, such as Apache and MySQL.

/etc/init.d/varnish start
/etc/init.d/varnish restart
/etc/init.d/varnish reload

You can also test the syntax of your configuration file using a command such as this:

/usr/sbin/varnishd -d -f /etc/varnish/default.vcl

You can interact directly with Varnish on the command line by connecting to its local port (telnet localhost 6082) or running varnishadm. We'll mention this a bit later when inspecting what is called the "ban list".

Varnish configuration is separated into two main files. The first is typically located at /etc/sysconfig/varnish and contains configuration details for the Varnish service, such as the amount of memory to allocate, the port to use, and the cache file location. The second is typically located at /etc/varnish/default.vcl and is where you will spend most of your time, tweaking settings related to HTTP requests and headers, the interaction with your content management system, serving pages, and more. This configuration uses a syntax called the Varnish configuration language (VCL).

In this article, all configuration examples are compatible with Varnish 3.0.2. If you are completely new to Varnish, be sure to check out its documentation, especially around VCL.

General Varnish configuration considerations

Ports

Your Apache instance most likely runs on port 80. The first thing you might want to do is configure Varnish to run on port 80 (so that it can act as the conduit for all HTTP requests) and have Apache run on an alternative port such as 88. This is straightforward to do in Apache (modify the Listen directive), and in Varnish this is configured in two places:

1. At the top of the VCL file similar to the following:

backend default {
  .host = "127.0.0.1";
  .port = "88";
}

2. In /etc/sysconfig/varnish:

VARNISH_LISTEN_PORT=80

Then, you want to jump into the VCL file and set up some basic parameters. The two main VCL functions are:

vcl_recv, which is run before passing the request to the back-end
vcl_fetch, which is run after the back-end has processed the response (available in the aptly beresp object) and before delivering it to the client

Default caching times

In the vcl_fetch function, here is a configuration block that accomplishes the following:

Cache only the front-end of your site (letting the Administration Interface and other subdomains always pass through to the back-end)
Cache only valid page responses (don't cache 404 and other error pages)
Set the default cache expiry time to 5 minutes, but increase that time to 30 days for permanent image URLs. (In eZ Publish, new versions of content images get new URL paths.)

# Only cache www.yoursite.com
# Only 200 responses
if( req.http.host == "www.yoursite.com" &&
    beresp.status == 200
)
{
    if( req.url ~ "^/var/plain/storage/images/.*" )
    {
        set beresp.ttl = 30d;
        set beresp.http.X-Ttl = "30d";
    }
    else
    {
        # Default caching time
        set beresp.ttl = 300s;
        set beresp.http.X-Ttl = "300s";
    }
}

Diagnostic headers

By default, when Varnish is installed, it requests all pages from the back-end. This means that your Apache logs will show all requests as coming from 127.0.0.1, in other words, the server itself. This makes troubleshooting your Apache logs quite difficult. This simple configuration snippet in vcl_recv makes it so that Varnish forwards the original request IP address to Apache via the X-Forwarded-For header:

if (req.http.x-forwarded-for)
{
    set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
}
else
{
    set req.http.X-Forwarded-For = client.ip;
}

Another useful bit of information to capture is the hit count for a particular page. This gives you a quick snapshot of the traffic to a page and how much usefulness you are getting out of Varnish. You can configure this in vcl_deliver, which executes before delivering the page to the client, no matter whether it was served from the cache or from the back-end:

if (obj.hits > 0) {
    set resp.http.X-Cache = "HIT";
    set resp.http.X-Cache-Hits = obj.hits;
}
else
{
    set resp.http.X-Cache = "MISS";
}

If you have multiple servers and Varnish running on each server, you can also set a header in vcl_deliver indicating the source server. This helps with troubleshooting individual servers.

set resp.http.X-Served-By = server.hostname;

Mobile sites

If you have a separate mobile site, you'll need to implement some redirection logic. We recommend that in any server setup, you try to put the mobile redirection logic as close to the user as possible to eliminate any caching complexities. In other words, in order of preference, you would put it: in the CDN, in Varnish, in Apache, then in the content management system. You can implement the mobile redirection logic client-side in JavaScript, but downsides to this include: challenges preserving referrer information; and relying on an extra main site page generation. Exact configuration details depend on the need for you to allow users to switch between the mobile site and the main site, and there are many possible behavior combinations; however, the common requirement is to recognize a mobile device based on its user agent string.

Here is a sample configuration adapting the user agent matches from detectmobilebrowsers.com for the VCL file in vcl_recv:

# Only redirect if we do not force the full browser using a GET parameter or COOKIE
if( !( req.url ~ "\?fullbrowser$" || req.http.Cookie ~ "fullbrowser=1" 
) )
{
     # Check for mobile device
     if( req.http.User-Agent ~ "(?i)android.+mobile|avantgo|bada\/|blac
kberry|blazer|compal|elaine|fennec|hiptop|iemobile|ip(hone|od)|iris|kindle|lge |
maemo|meego.+mobile|midp|mmp|netfront|opera m(ob|in)i|palm( os)?|phone|p(ixi|re)
\/|plucker|pocket|psp|series(4|6)0|symbian|treo|up\.(browser|link)|vodafone|wap|
windows (ce|phone)|xda|xiino" )
     {
         error 750 "Moved Temporarily";
     }
}

This is accompanied by catching the "error" in the vcl_error function. This is in a way a hack, because Varnish does not have a built-in redirect function.

if( obj.status == 750)
{
    set obj.http.Location = "http://m.yoursite.com" + req.url;
    set obj.status = 302;
    return(deliver);
}

Integration with eZ Publish

Purge on publish

The most important aspect of integrating Varnish with eZ Publish is to synchronize eZ Publish content updates and additions with precise clearing of relevant page caches in Varnish. This is often referred to as a "purge on publish" feature.

eZ Publish already does fine-grained cache clearing of pages within its internal view cache system (and consequently, its internal static cache system), with cache clearing rules based on the content tree structure, related objects, and more. In other words, when a page is published or edited, eZ Publish already has a system to determine which other pages should have their caches cleared. To hook into that system to send PURGE HTTP requests to Varnish, we submitted a pull request to patch eZ Publish's kernel. This means that eZ Publish Community Version 2011.10 and higher, as well as eZ Publish Enterprise 4.7 and higher, natively support custom static cache handlers.

On the Varnish configuration side of things, here is some example code for the vcl_recv function to clear Varnish caches based on PURGE HTTP requests:

if( req.request == "PURGE" )
{
    # Limit access for security reasons
    if( !client.ip ~ purge )
    {
        error 405 "Not allowed.";
    }

    # URL purges -- one for the URL and one for all view parameter variations
    if( req.http.X-Purge-Url )
    {
        set req.http.X-Purge-Url1 = "^" + req.http.X-Purge-Url + "$";
        set req.http.X-Purge-Url2 = "^" + req.http.X-Purge-Url + "/\(";

        ban( "obj.http.x-url ~ " + req.http.X-Purge-Url1 );
        ban( "obj.http.x-url ~ " + req.http.X-Purge-Url2 );
        error 200 "URL Purged.";
    }

    # Any regular expressions here
    if( req.http.X-Purge-Reg )
    {
        ban( "obj.http.x-url ~ " + req.http.X-Purge-Reg );
        error 200 "Regular Expression Purged.";
    }

    error 405 "Missing X-Purge-Url or X-Purge-Reg header.";
}

In short, what happens is that URLs are added to Varnish's internal "ban list", which Varnish continuously crawls in order to expire cache files. You can inspect this list using the Varnish command line administration interface (accessed using varnishadm or telnet localhost 6082) with the command ban.list.

A quick troubleshooting tip if you experience slowdowns when publishing eZ Publish content: on the Varnish end, tweak the minimum and maximum number of threads (start with VARNISH_MIN_THREADS=16 and VARNISH_MAX_THREADS=512 in /etc/sysconfig/varnish); on the eZ Publish end, implement a limit on the maximum number of URLs that can be purged at once.

Manually purging URLs

With a custom Varnish static cache handler for eZ Publish in place and the "purge" configuration detailed above, you can build a simple tab in the eZ Publish Administration Interface to clear specific URLs. It is also easy to support purging based on regular expressions.

In addition, using the existing "Delete view cache" and "Clear static cache" functionalities in eZ Publish would manually trigger cache clears in Varnish.

Excluding pages from Varnish cache

Depending on your site, different pages might need to be cached for different lengths of time (for example, a landing page would be refreshed more often than an archived article), and you might want to completely exclude certain pages from being cached (for example, pages that are customized per user). By default, POST requests don't get cached by Varnish, so a page such as a contact form typically doesn't need any special configuration.

In some cases, you can offload dynamic logic to cookies, Ajax, and ESI (Edge Side Includes) in order to allow full pages to continue to be cached by Varnish. For example, cookies would work well for a quiz. Ajax and ESI would be useful for sidebars that need to be updated more often than body content.

Another issue to consider is that of user sessions. In eZ Publish versions prior to 4.4, the default session handler automatically creates a user session for each site visitor, including anonymous users. You can strip all cookies set by the server in order to prevent sessions from being cached by Varnish and causing clashes. For sites with logged in users, such as with a paywall, this gets a bit more complicated, but you can set up an exception for the "is_logged_in" cookie.

For illustration purposes, let's look at how you would allow eZ Publish to indicate to Varnish that entire pages should bypass Varnish. While you could completely manage this in Varnish, it is arguably more intuitive to manage caching times at the content management level.

We choose to allow the "Edge-Control" HTTP header to override the default Varnish caching times. This header does not affect browser behavior the way "Cache-Control" does, so we can configure it to only affect Varnish in the vcl_fetch function:

if( beresp.http.Edge-Control )
{
    set beresp.ttl = 0s;
    set beresp.http.X-Ttl = "0s";
}

Note that the example automatically excludes any page that has the "Edge-Control" header from being cached. You could extend the solution so that the back-end has the ability to specify the exact validity time of the page.

On the eZ Publish side, you can specify the "Edge-Control" value for specific URL paths and modules / views in the [HTTPHeaderSettings] block in site.ini. At the template level, you can create a custom template operator to allow the header to be set. Be sure to use ezpagedata or a persistent variable to send the header information from full views to be set in the pagelayout; otherwise, the header information won't be set when view caching is turned on. See the 3 code samples below:

1. Full view template

{ezpagedata_append( 'http_headers', 'Edge-Control: cache-maxage=0s' )}

2. Pagelayout template

{if is_set( $module_result.content_info.persistent_variable.http_headers )}
    {foreach $module_result.content_info.persistent_variable.http_headers as $header}
        {$header|set_header()}
    {/foreach}
{/if}

3. Template operator snippet

case 'set_header':
{
    header( $operatorValue );
    $operatorValue = '';
}
break;

Optimization

Once you've set up Varnish and configured it to work in sync with eZ Publish, how do you know that it's working and whether it's working as well as it should?

You can check out the varnishstat utility to show your cache hit rates and an indication of the suitability of the allocated resources
Use curl -I http://www.yoursite.com/page/on/your/site to inspect HTTP headers
Use standard benchmark tools such as ab and New Relic, to see just how much faster your site is

Using Varnish to speed up eZ Publish websites

When to use Varnish

Varnish management

General Varnish configuration considerations

Ports

Default caching times

Diagnostic headers

Mobile sites

Integration with eZ Publish

Purge on publish

Manually purging URLs

Excluding pages from Varnish cache

Optimization

Contact Us

Site Map

This website uses cookies

Using Varnish to speed up eZ Publish websites

When to use Varnish

Varnish management

General Varnish configuration considerations

Ports

Default caching times

Diagnostic headers

Mobile sites

Integration with eZ Publish

Purge on publish

Manually purging URLs

Excluding pages from Varnish cache

Optimization

Contact Us

Site Map