Mugo Web main content.

Varnish caching of non-sensitive content for logged-in users

By: Peter Keung | August 15, 2014 | Site performance

Varnish is great for high traffic sites where the same pages are served over and over to millions of visitors, but when you have to do something differently depending on the specific user or user group, things get complicated. There are several techniques, and how you might use them depends on the details. Here, we have outlined a solution for a particular use case on eZ Publish.

The usual techniques for serving cached content to specific users are:

  • Ajax, whereby Varnish serves the same page to everybody, but the browser uses JavaScript to fetch the user's content and insert it into the page;
  • Edge Side Includes, which involves specifying different caching times (or "time to live") for different parts of the page allowing Varnish to assemble the page from its cache (Hint - turn the TTL down to 0 and the Varnish cache becomes transparent);
  • Use information in the page request (such as a header or cookie) to determine which full page variation to serve.

All 3 approaches can be used together. In this post, we will focus on the last approach and, for simplicity, presume that page variations are the same within a user group. For example, on a news site, anonymous users might see only article introductions, whereas registered users would see entire articles.

In a nutshell, the problem with serving cached content to specific users is to avoid hitting the backend (i.e., the CMS, database, and etc.) for every page load. The example Varnish configuration in the eZ Publish documentation makes a Varnish sub-request to the backend in order to grab the "user hash" and essentially validate the user on each page. To optimize performance, we would be reduced to making the endpoint that supplies the user hash as light as possible. However, we can actually do much better than this!

If the goal is to protect the backend against as many requests as possible, this might come into conflict with other requirements. It is up to you to be aware of the implications and to make the good choice. The solution described below uses a short-lived but shareable cookie to determine which cached version of a page to display. This means that a user capable of editing the cookie will be capable of seeing all the cached versions. For example, on a site with gated content, it's a trade-off between improved speed performance versus a non-porous paywall.

Configuration in eZ Publish and Varnish

eZ Publish has a superior permissions/security system in which each user is assigned some roles, where each role is comprised of specific permissions. Users in a given user group all have the same roles. In this use case, we can generate a hash of a user's roles, store that in a cookie, and use it to identify which page variation a user should receive.

In eZ Publish, the main implementation piece is to set the "role hash" whenever a user logs in and whenever a fresh page is generated by the CMS.

The function to generate the role hash is as follows. Note that for some extra security, the role hash is salted with a timestamp representing the current date, so that the role hash is not always the same for a group of users.

public function getUserHash()
{
    $ini = eZINI::instance( 'mugo_varnish.ini' );
    $userDetails = eZUser::currentUser()->getUserCache();
    $hashArray = array( $userDetails[ 'roles' ], $userDetails[ 'role_limitations' ] );

    // To increase security, we can also use a once-daily timestamp to build the user hash
    if( $ini->variable( 'VarnishSettings', 'AppendTimestampToUserHash' ) == 'enabled' )
    {
        $hashArray[] = strtotime( 'today' );
    }
    return md5( serialize( $hashArray ) );
}

In order to set cookies based on this hash, we run the code during a couple of eZ Publish events: response/preoutput and session/regenerate. Both of these events then trigger a setUserHashCookie function:

public static function setUserHashCookie( $unsetCookie = false )
{
    $wwwDir = eZSys::wwwDir();
    // On host based site accesses this can be empty, causing the cookie to be set for the current dir,
    // but we want it to be set for the whole eZ Publish site
    $cookiePath = $wwwDir != '' ? $wwwDir : '/';

    if( eZUser::isCurrentUserRegistered() )
    {
        $ini = eZINI::instance();
        setcookie( 'vuserhash', self::getUserHash( $newSession ), time() + $ini->variable( 'Session', 'SessionTimeout' ), $cookiePath );
    }
    elseif( $unsetCookie )
    {
        //removes cookie
        setcookie( 'vuserhash', '0', 1, $cookiePath );
    }
}

Within the Varnish configuration, we need to use the role hash to determine the uniqueness of the cached page. (Note that all of the Varnish configurations in this post are for Varnish 4.)

sub vcl_hash {
    hash_data(req.url);
    if (req.http.host) {
        hash_data(req.http.host);
    } else {
        hash_data(server.ip);
    }
    # Cache variations based on a provided cookie value
    # But we exclude page assets like images, css and javascript
    if( ( req.url !~ "^/var/[^/]+/(storage|cache)/.*" &&
          req.url !~ "^/extension/[^/]+/design/[^/]+/(stylesheets|images|lib|javascripts?|flash)/.*" &&
          req.url !~ "^/design/[^/]+/(stylesheets|images|javascripts?|lib|flash)/.*" &&
          req.url !~ "^/share/icons/.*"
        ) &&
        req.http.cookie ~ "vuserhash="
       ) {
        hash_data( regsub( req.http.cookie, ".*vuserhash=([^;]+);.*", "\1" ) );
    }
    return (lookup);
}

As well, when serving a non-cached page, we need to validate the role hash value in the cookie that was passed in. This is done in vcl_backend_response (formerly vcl_fetch):

# Backend only sends vuserhash cookie for HTML responses
# We have to use the "header" Varnish module so that it can read multiple set-cookie headers
if( header.get( beresp.http.set-cookie, "vuserhash=" ) )
{
    # Check if client has invalid user hash value
    # Comparing client cookie with server response cookie
    if( regsub( header.get( beresp.http.set-cookie, "vuserhash=" ), "vuserhash=([^;]+).*", "\1" ) != regsub( bereq.http.cookie, ".*vuserhash=([^;]+).*", "\1" ) )
    {
        #std.syslog(180, "VARNISH: Invalid cookie found." );
        set beresp.uncacheable = true;
        return( deliver );
    }
    else
    {
        # Making sure object gets cached -- even with set-cookie header
        return( deliver );
    }
}

Lastly, we need to tell the browser that the page varies based on the cookie value in order to ensure that the browser serves the correct page before and after users log in. This is done in vcl_deliver:

set resp.http.Vary = "Cookie";

For more information, see some fuller code examples in our Mugo Varnish extension: https://github.com/mugoweb/mugo_varnish

Comments

blog comments powered by Disqus