Mugo Web main content.

eZ Publish content attribute transformation made easy with eep

By: Benjamin Kroll | January 16, 2017 | eZ Publish development tips, Web solutions, eep, data migration, and data transformation

Content attribute transformation or conversion in eZ Publish isn't required as often as data import or data migration, but when it is, it can take a similar amount of effort. eep simplifies the process with its flexible built-in attribute module options.

eep is a command line tool for eZ Publish that greatly speeds up development, helps you reproduce work across local/dev/staging/production environments, and promotes the re-use of code.

What is content attribute transformation?

Content attribute transformation describes the transfer of attribute data from one attribute to another within the same object. In concept, we are 'converting' an existing attribute / field from one datatype to another, but in practice, we are copying the data to a new field. In such a case, the attribute data for all objects needs to be retained but stored differently.

This type of transformation is generally needed when project requirements change over time. For example, you might want to transform or convert from a string attribute to an integer attribute to enforce integer validation. Or support multiple object relations instead of just a single object relation.

In the example below we want to keep the attribute in place and just transform its datatype and data from object relation (singular) to object relations (plural) to enable the 'onix_product' (content class) objects to relate to multiple 'series' objects instead of just one.

Example transformation breakdown

The transformation process typically involves the following main steps (as described for our example):

  1. Creating a new content class attribute 'series_new' by editing the class definition via the Administration Interface(eep has options to do this task for you as well, but those are out of scope for this post)
  2. Migrating the existing 'series' attribute data to the new 'series_new' attribute
    1. Creating the list of affected object IDs
    2. Extracting the attribute content for each affected content object attribute
    3. Filtering out only objects with values in the affected attribute
    4. Importing the content to the new attribute
  3. Updating any references to the old attribute in template and other code
  4. Deleting the old attribute if the migration to the new one was successful.Alternatively, renaming the old attribute, renaming the new attribute, and then deleting the old one

We will focus on the components of step 2: Migrating the existing 'series' attribute data to the new 'series_new' attribute.

Creating the list of affected object IDs

First, we'll need the content object ID for each content object we want to migrate.

Using awkwe'll extract the ID from the information table returned by eep's contentclass module's fetchallinstances option. We're using the eep contentclass alias cc here.

# dump all onix_product object ids
eep cc fetchallinstances onix_product | awk '$1=="|" {print $2}' > _migrate/all_onix_product.oids

Extracting the attribute content

Once we have that, we'll log the current attribute content for each of the objects. We'll store it pipe delimited as <content_object_id>|<series_attribute_content> for ease of use.

Using xargs to execute multiple shell commands lets us store the object ID and series attribute data in one step.

# dump all existing onix_product series info
cat _migrate/all_onix_product.oids | xargs -IOID bash -c "echo -n 'OID|';eep at tostring OID series_old;" > _migrate/all_onix_product_series.log

# file contents
...
6199|
6543|2605
6513|2763
6518|
...

Filtering out only objects with attribute data

In the last preparation step, we'll extract only the lines for content objects that have series attribute information.

Using awk again, we'll split each line by specifying a separator with -F before checking the second item is non-empty and returning the whole line if it isn't.

# only migrate for objects with an existing series relation
cat _migrate/all_onix_product_series.log | awk -F\| '$2!="" {print $0}' > _migrate/migrate_onix_product_series.log

# file contents
...
6499|5272
6543|2605
6513|2763
...

Since we're moving data from an object relation (singular) to an object relations (plural) attribute, we won't need to worry about any kind of conversion of the incoming data.

If you need to move data between other datatypes, please refer to the fromString documentation included in Mugo's data import extension to find out which format each datatype expects.

Depending on the complexity of the datatypes, you may be able to do the conversion with existing command-line tools as well.

Importing the content to the new attribute (dry run)

To make sure everything is in order, we'll do a dry run of the migration by simply displaying the full commands to be run for each migration item to the command line.

NOTE: use -l for xargs option on *nix, -L1 on OS X/macOS

# DRY RUN
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'echo eep at fromstring $0 series_new $1'

Once we're happy with what we see, we can move on to the actual migration.

Running the import (live)

If you have only a small number of items to migrate, you can run the migration script with output directly to the command line. (Note that a successful fromstring operation creates no output).

# MIGRATION
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'eep at fromstring $0 series_new $1'

Otherwise, it makes sense to screen the script to avoid a lost connection interrupting the migration. Log the output somewhere and watch the log with tail to monitor progress.

A second command has been added to the xargs call to display the object ID processed last for logging purposes.

# MIGRATION (screen, log, watch)
screen -S series_attr_migration
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'eep at fromstring $0 series_new $1;echo$0' > _migrate/progress.log
# to exit the screen session
tail -f _migrate/progress.log

Putting it all together

# dump all onix_product object ids
eep cc fetchallinstances onix_product | awk '$1=="|" {print $2}' > _migrate/all_onix_product.oids

# dump all existing onix_product series info
cat _migrate/all_onix_product.oids | xargs -IOID bash -c "echo -n 'OID|';eep at tostring OID series_old;" > _migrate/all_onix_product_series.log

# only migrate for objects with an existing series relation
cat _migrate/all_onix_product_series.log | awk -F\| '$2!="" {print $0}' > _migrate/migrate_onix_product_series.log

# MIGRATION
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'eep at fromstring $0 series_new $1'