Mugo Web main content.

Adding data to the eZ Find index with Index Time Plugins

By: Benjamin Kroll | August 12, 2014 | eZ Publish development tips, ezfind, solr, index, plugins, and search

Index time plugins are one of the most important techniques of extending eZ Find functionality; they allow you to control how and what data is indexed. Combined with custom eZ Find queries, this opens up huge opportunities for providing access to content, well beyond mere 'search'.

In this post we will look at some typical use cases, briefly consider out of the box functionality and then dive into why you would want to make use of index time plugins and how you would go about setting one up.

"Index time" refers to the fact that eZ Find and Solr maintain a digest of your data independent of the eZ Publish datastore. The digest, properly called "the eZ Find index" is maintained while you edit your content, and not at "query time". The eZ Find index is optimized in ways that eZ Publish is not, and this provides an important location for functionalities that are difficult or impossible within eZ Publish.

These use cases may be more or less common, depending on the application or you're working on. Let's take a look at a few ...

Note: The index time plugin functionality is only available in eZ Find 2.7 or higher.

Why would you want to add / modify content in eZ Find?

You may want to index attribute data on a content class that is normally stored in a simplified format (e.g. text line of space separated values) which is unsuitable for search queries or sorting in its current form, but is sufficient for display purposes.

Another use case would be to index information that is not available in attribute or relation form. For example, flagging content objects if they have been reviewed by a specific user or user group e.g. Editors, Bloggers, Educators. That information is not available directly for indexing, but can be easily retrieved by looking at the owners of the reviews for the object in question at index time.

The eZ Publish documentation1 offers a few more examples:

  • indexing parent node information with an object or coping with a data model implementation where relations are part of the domain model, but do not use related objects or its datatype equivalents
  • boosting articles that reference certain images, or articles that appear inside special eZ Flow blocks

What can you do without index time plugins?

eZ Find provides some control over how different datatypes are processed during indexing for search-, sorting-, faceting- or filtering operations 'out of the box' via datatype mappings. What works great for searching, may be unsuitable for sorting for example.

If you need only to handle indexing of existing datatypes, these mapping settings may be all you need. However, they do not cover use cases which are not directly related to a single datatype or specific class attribute. In those cases the index time plugin system is what you'd want to use. A look at the settings in ezfind.ini below highlights the different mappings, which include custom mappings for specific class attributes.

Custom mapping for eZ Publish datatypes via CustomMap
# CustomMap[eztext]=ezfSolrDocumentFieldText
CustomMap[dummy_example]=ezfSolrDocumentFieldDummyExample
CustomMap[ezsrrating]=ezfSolrDocumentFieldStarRating
...
Datatype mapping (as of 2.2 this means for searching only)
DatatypeMap[ezstring]=text
DatatypeMap[eztext]=text
DatatypeMap[ezsrrating]=sfloat
...
Datatype mapping for SORTING (as of 2.2)
DatatypeMapSort[ezstring]=string
DatatypeMapSort[ezinteger]=sint
DatatypeMapSort[ezselection]=string
...
Datatype mapping for FACETING (as of 2.2)
DatatypeMapFacet[ezstring]=string
DatatypeMapFacet[ezkeyword]=lckeyword
DatatypeMapFacet[ezinteger]=tint
...
Datatype mapping for FILTERING
DatatypeMapFilter[ezkeyword]=lckeyword
DatatypeMapFilter[ezinteger]=tint
DatatypeMapFilter[ezselection]=string
...
Custom attribute mapping
#CustomAttributeMap[class_identifier/attribute_identifier]=ezfSolrDocumentFieldMetaData
...

What can you do with an index time plugin?

The datatype mappings above are a great start, which the index time plugins take a step further by allowing developers to index and make available data that is not normally stored in attribute or relationship form in eZ Publish or is perhaps not stored in a suitable format.

Adding such data makes the eZ Find index more valuable and can speed up things considerably when content fetches can be swapped out for specific eZ Find fetches which leverage this custom index data.

Use case 1 - Flagging content by a user/user group

A site contains a large amount of content (e.g. reviews, blog posts) and we want to flag content that has been created by a specific user or user group for one reason or another (e.g. high quality, trusted source, or a sponsor partner). Since the content has already been created and we don't want to add a new attribute to the content class(es), we decide to add a new field to the eZ Find / Solr index, instead.

A suitable field type for this case would be the pre-defined dynamic boolean field *_b, which is part of eZ Find's default Solr schema. (See the "Schema updates" section for more information about field types and "How do you setup an index time plugin?" for code examples.)

The index time plugin will check the ownership of each content object against a list of user ids and add the new field with the appropriate true/false value to the index.

Use case 2 - Enhancing content with 'meta'-data for better searching & filtering

An upcoming feature on 49thshelf.com - a site dedicated to all things Canadian Literature - will enhance the indexed data for its main content type (books) with some 'meta' data. This meta data is valuable to a specific group of users (e.g. educators, librarians) when selecting books for the classroom. The data is stored in class attributes using the textline datatype, although not in a form suitable for indexing directly.

An index time plugin is used to transform the attribute data (Interest age, Grade range and Reading age) into a format that can be used to create detailed search queries using Solr's range syntax.

The example below explores this use case in detail.

How do you setup an index time plugin?

Create the plugin class(es)

... in /extension/<my_extension>/classes/indexplugins/ezfIndex<my_pluginclassname>.php

All index time plugin classes share the ezfIndex name prefix and must implement the ezfIndexPlugin interface, consisting of a single method modify().

The plugin's modify() method takes a content object as well as the Solr document list (for all available languages; by reference) as parameters during indexing.

You cannot define multiple plugins for the same content class (see eZ Find settings below). As a simple work-around you can create a wrapper class that calls all required plugins instead.

// ezfIndexOnixProductExtras
... 
class ezfIndexOnixProductExtras implements ezfIndexPlugin
{
    /**
     * The modify method gets the current content object AND the list of
     * Solr Docs (for each available language version).
     *
     *
     * @param eZContentObject $contentObect
     * @param array $docList
     */
    public function modify( eZContentObject $contentObject, &$docList )
    {
        ezfIndexOnixProductLibrarianReviewed::modify( $contentObject, &$docList );
        ezfIndexOnixProductGradeRangeReadingInterestAge::modify( $contentObject, &$docList );
    }
}
...

This actually helps to keep things nice and tidy as all contentclass-related plugins are bundled together.

One of the 49thshelf plugins for example looks something like this:

// ezfIndexOnixProductGradeRangeReadingInterestAge
... 
    $currentVersion     = $contentObject->currentVersion();
    $availableLanguages = $currentVersion->translationList( false, false );
    
    foreach ( $availableLanguages as $languageCode )
    {
        ... parse the attribute data into the format we want to store ...
        
        // add the extra information to the title index (dynamic field type *_range_si)
        // *_range_si is a multi-valued integer typed field (ezfind/java/solr/conf/schema.xml for details)
        $docList[ $languageCode ]->addField( 'extra_grade_range_si', $gradeRanges );
        $docList[ $languageCode ]->addField( 'extra_interest_age_range_si', $interestAges );
        $docList[ $languageCode ]->addField( 'extra_reading_age_range_si', $readingAges );
    }
...

After the attribute data is processed (from text line to separate integer values; not shown), the plugin loops over the available languages and adds the new data to the document list for indexing.

The dynamic field type, *_range_si, is defined in Solr's schema.xml (see below). Essentially, it is an array.

eZ Publish attribute data (text lines):

Interest-age: 7 8
Grade-range: 2 3
Reading-age: 7 8 9

eZ Find index data:

<arr name="extra_interest_age_range_si">
  <int>7</int>
  <int>8</int>
</arr>
<arr name="extra_grade_range_si">
  <int>2</int>
  <int>3</int>
</arr>    
<arr name="extra_reading_age_range_si">
  <int>7</int>
  <int>8</int>
  <int>9</int>
</arr>

An example plugin is also provided as part of the eZ Find extension in /extension/ezfind/classes/indexplugins/ezfindexparentname.php

ezfind.ini settings

To activate the plugin add it to the class hooks array in ezfind.ini's [IndexPlugins] block.

# Classhooks will only be called for objects of the specified class
Class[]
Class[onix_product]=ezfIndexOnixProductExtras

Schema updates

To store the extra data we need to define a new field type in the fields block in Solr's schema.xml (extension/ezfind/solr/conf/schema.xml). The new field type will support all three of the new fields because the mechanics of the attributes are the same for each.

Dynamic field types share a field name prefix * followed by the field type suffix e.g. _b for booleans. Use these field types with your choice of prefix to create the field name e.g. my_boolean_field_b, my_rather_long_float_name_f. As long as the field name ends with a valid type suffix it will be indexed as that type.

Dynamic field types are very useful, as they allow new fields to be added to the index without having to add a new field definition to Solr's schema.xml.

By default fields store only one value. As we want to store a range of values (integers) for each attribute, the 'multiValued' attribute needs to be set to true.

<!-- Grade range, Interest & Reading ages (may be multiple) -->
<dynamicField name="*_range_si" type="sint" indexed="true" stored="true" multiValued="true"/>

Any other attribute we may want to index as a range in the future, can use the '_range_si' field type suffix to be indexed in the same way.

Note: Solr's schema.xml as provided as part of eZ Find already contains a large variety of field types. Before creating a new one, it might be worth having a look at what's on offer to save time and effort. For a full list of pre-defined field types check /extension/ezfind/java/solr/config/schema.xml

Use within templates or PHP

To use the additional information in the index, construct your queries using Solr's range syntax. The * character can be used as a wildcard at either end of the range.

{def $data = fetch( 'ezfind', 'search', hash(
    'query', 'foobar'
    , 'filter', 'extra_reading_age_range_si:[8 TO *]'
))} 

$data = eZFunctionHandler::execute( 'ezfind', 'search', array(
    'query' => 'foobar'
    , 'filter' => array( 'extra_interest_age_range_si:[8 TO *] OR extra_reading_age_range_si:[8 TO 11]' )
)); 

Final steps

  1. Clear cache(s)
  2. Re-generate autoloads
  3. Re-load/start Solr
  4. Re-index your site/data 2
  5. Check your extra data has made it into the index via a site search or the Solr admin interface (<your_host>:8983/solr/admin/)

Additional resources