Blog» eZ Publish content attribute transformation made easy with eep

eZ Publish content attribute transformation made easy with eep

By Benjamin Kroll  | January 16, 2017  |  eZ Publish development tips, Web solutions

Content attribute transformation or conversion in eZ Publish isn't required as often as data import or data migration, but when it is, it can take a similar amount of effort. eep simplifies the process with its flexible built-in attribute module options.

eep is a command line tool for eZ Publish that greatly speeds up development, helps you reproduce work across local/dev/staging/production environments, and promotes the re-use of code.

What is content attribute transformation?

Content attribute transformation describes the transfer of attribute data from one attribute to another within the same object. In concept, we are 'converting' an existing attribute / field from one datatype to another, but in practice, we are copying the data to a new field. In such a case, the attribute data for all objects needs to be retained but stored differently.

This type of transformation is generally needed when project requirements change over time. For example, you might want to transform or convert from a string attribute to an integer attribute to enforce integer validation. Or support multiple object relations instead of just a single object relation.

In the example below we want to keep the attribute in place and just transform its datatype and data from object relation (singular) to object relations (plural) to enable the 'onix_product' (content class) objects to relate to multiple 'series' objects instead of just one.

Example transformation breakdown

The transformation process typically involves the following main steps (as described for our example):

  1. Creating a new content class attribute 'series_new' by editing the class definition via the Administration Interface
    (eep has options to do this task for you as well, but those are out of scope for this post)
  2. Migrating the existing 'series' attribute data to the new 'series_new' attribute
    1. Creating the list of affected object IDs
    2. Extracting the attribute content for each affected content object attribute
    3. Filtering out only objects with values in the affected attribute
    4. Importing the content to the new attribute
  3. Updating any references to the old attribute in template and other code
  4. Deleting the old attribute if the migration to the new one was successful.
    Alternatively, renaming the old attribute, renaming the new attribute, and then deleting the old one

We will focus on the components of step 2: Migrating the existing 'series' attribute data to the new 'series_new' attribute.

Creating the list of affected object IDs

First, we'll need the content object ID for each content object we want to migrate.

Using awk we'll extract the ID from the information table returned by eep's contentclass module's fetchallinstances option. We're using the eep contentclass alias cc here.

# dump all onix_product object ids
eep cc fetchallinstances onix_product | awk '$1=="|" {print $2}' > _migrate/all_onix_product.oids

Extracting the attribute content

Once we have that, we'll log the current attribute content for each of the objects. We'll store it pipe delimited as <content_object_id>|<series_attribute_content> for ease of use.

Using xargs to execute multiple shell commands lets us store the object ID and series attribute data in one step.

# dump all existing onix_product series info
cat _migrate/all_onix_product.oids | xargs -IOID bash -c "echo -n 'OID|';eep at tostring OID series_old;" > _migrate/all_onix_product_series.log

# file contents
...
6199|
6543|2605
6513|2763
6518|
...

Filtering out only objects with attribute data

In the last preparation step, we'll extract only the lines for content objects that have series attribute information.

Using awk again, we'll split each line by specifying a separator with -F before checking the second item is non-empty and returning the whole line if it isn't.

# only migrate for objects with an existing series relation
cat _migrate/all_onix_product_series.log | awk -F\| '$2!="" {print $0}' > _migrate/migrate_onix_product_series.log

# file contents
...
6499|5272
6543|2605
6513|2763
...

Since we're moving data from an object relation (singular) to an object relations (plural) attribute, we won't need to worry about any kind of conversion of the incoming data.

If you need to move data between other datatypes, please refer to the fromString documentation included in Mugo's data import extension to find out which format each datatype expects.

Depending on the complexity of the datatypes, you may be able to do the conversion with existing command-line tools as well.

Importing the content to the new attribute (dry run)

To make sure everything is in order, we'll do a dry run of the migration by simply displaying the full commands to be run for each migration item to the command line.

NOTE: use -l for xargs option on *nix, -L1 on OS X/macOS

# DRY RUN
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'echo eep at fromstring $0 series_new $1'

Once we're happy with what we see, we can move on to the actual migration.

Running the import (live)

If you have only a small number of items to migrate, you can run the migration script with output directly to the command line. (Note that a successful fromstring operation creates no output).

# MIGRATION
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'eep at fromstring $0 series_new $1'

Otherwise, it makes sense to screen the script to avoid a lost connection interrupting the migration. Log the output somewhere and watch the log with tail to monitor progress.

A second command has been added to the xargs call to display the object ID processed last for logging purposes.

# MIGRATION (screen, log, watch)
screen -S series_attr_migration
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'eep at fromstring $0 series_new $1;echo$0' > _migrate/progress.log
# to exit the screen session
tail -f _migrate/progress.log

Putting it all together

# dump all onix_product object ids
eep cc fetchallinstances onix_product | awk '$1=="|" {print $2}' > _migrate/all_onix_product.oids

# dump all existing onix_product series info
cat _migrate/all_onix_product.oids | xargs -IOID bash -c "echo -n 'OID|';eep at tostring OID series_old;" > _migrate/all_onix_product_series.log

# only migrate for objects with an existing series relation
cat _migrate/all_onix_product_series.log | awk -F\| '$2!="" {print $0}' > _migrate/migrate_onix_product_series.log

# MIGRATION
cat _migrate/migrate_onix_product_series.log | awk -F\| 'BEGIN {OFS=" "}{print $1,$2}' | xargs -l sh -c 'eep at fromstring $0 series_new $1'

Related Blog Posts

Command line tool for eZ Publish, called "eep"

Mugo has a tool that we use internally to help with the main aspects of a developer's life: development, debugging and maintenance. The tool is called "...

Read more »

eep in action: eZ Publish command line operations

eep (Ease eZ Publish) is a command line tool we introduced in a previous post, which in combination with other command line tools like awk, grep

Read more »

eep case study: Author name resolution in The 49th Shelf

The problem:

One of our favourite projects, The 49th Shelf, aggregates a lot of data from a diversity of sources. Naturally, there is a range of quality;...

Read more »

Making custom content scripts more efficient in eZ Publish

For those who write long-running scripts in eZ Publish to perform operations (move, rename, update, and so on) on many content objects, here are a couple...

Read more »

Comments

blog comments powered by Disqus

Hi, we're Mugo Web - Nice to meet you!

We're a group of web experts who solve complex web problems.

Learn more about us »

Search


Categories


Yes - we can do that.

We can do that

Many years of experience with complex websites allows us to offer total solutions.

Learn more about what we can do »

We love our clients (and they love us too)

Collage of logos : American express, Habitat, Car and Driver, Rasmussen, and American museum of Natural History

We've solved problems across North America and around the world.

Learn more about what we've done »

We tweet too

Follow us on Twitter for the latest Mugo happenings

mugo twitter page @mugo

© 2008 - 2017 Mugo Web. All rights reserved.