Mugo Web main content.

Archiving in eZ Publish: a CSMonitor.com case study

By: Philipp Kamps | July 11, 2013 | Case study and eZ Publish development tips

Mugo planned and implemented an article archiving solution for The Christian Science Monitor, an award-winning news website. The site has a large amount of content: visitors have access to articles starting from 1980. In total, the site has almost 800,000 content objects and the current setup serves up to 48 million page views per month.

In order to increase the overall performance of the system, we decided to split the site into 2 eZ Publish instances:

1) a live instance serving recent content from the last 2 years
2) an archive instance containing the remaining content

Since most of the traffic and editorial work happens in the live instance, having a smaller database with only recent content is a high performing setup. It can better handle big traffic spikes and enables editors to work in a speedy Administration Interface.

Content in the archive site does not change frequently; therefore, we were able to increase the caching times on the archive site, resulting in an additional performance gain.

The solution was implemented in the back-end. All content is served from a single domain and therefore does not have any negative SEO effects. Editors still use one Administration Interface, but can still access archived content in order to, for example, create content relations from the live site to the archive site.

CSMonitor.com now has room to grow in size and traffic volume. Splitting the content is an efficient way to scale the setup and to reduce future increases in hosting costs.

Key technical details

  • We use Apache proxy redirects to split the traffic and redirect requests for older articles to the archive eZ Publish install
  • The eZ Find search extension (which uses Solr) is the "glue" between the two installations. For example, it serves search results from both instances and populates listing pages (such as "all USA content"). We extended it to store additional information such as thumbnail paths and article "kickers" that categorize content.
  • For editors, we created a new eZ Publish "datatype" that enables editors to link content across instances. In that context, editors can browse and search both instances. It stores its relations based on the eZ Publish "remote_id" field.

For more information, please see these slides and this video presentation.