Mugo Web main content.

Keeping URL clean aliases

By: Xavier Cousin | January 4, 2013 | User experience

I noticed a while ago on a client's website about Canadian books that some of the book nodes would have their url_alias broken and the url /content/view/full/<node_id> would be shown instead of the nice url. Running the updateniceurls.php would not solve the problem so I had a look at it and tried to see what could cause the problem and realized that the two tables ezcontentobject_tree (containing the node information) and ezurlalias_ml (containing the url_alias path parts) had different data for the same node.

A bit of context

Each book is updated via an import script and during this process, we first check if the book has to be moved (meaning the title had been updated, probably due to a previous typo).

Books are stored on the site within folder from Books/A/<title> to Books/Z/<title> with a Books/9/<title> for titles that don't start with a letter. Because of this way of placing books, if the first letter of the title changes we have to move the book to the new proper folder when we update it.

What the code was doing is first move the object to the new folder:

if( $newParentNodeId != $theBookNode->attribute( 'main_node_id' ) )
{
    $theBookNode->move( $newParentNodeId  );
}

Then update the title and other changes in the data and re-publish the object

$theObject = eZContentObject::fetch( $objectId );
$objectVersion = $theObject->currentVersion();
$publish = eZOperationHandler::execute
(
    "content"
    , "publish"
    , array
    (
        "object_id" => $theObject->attribute( "id" )
        , "version" => $objectVersion->attribute( "version" )
    )
);

Tracking the events step by step

In order to find out what was going wrong, I imported two editions of a book with a typo on its title first then fixed the typo and re-imported both of them

== first import with typo in the title (first letter is a R instead of a J) ==

First book:
in ezurlalias_ml, parent is folder "R", url part "Rust-Mary"
in ezcontentobject_tree parent is folder "R", path_identification_string is "books/r/rust_mary"
Second book:
in ezurlalias_ml, parent is folder "R", url part is "Rust-Mary2"
in ezcontentobject_tree parent is folder "R", path_identification_string is "books/r/rust_mary2"

== second import, real title (first letter is now J as it should be) ==

First book:
in ezurlalias_ml, there is now a new row set as original that has the folder "J" as parent and "Just-Mary" as url part
in ezcontentobject_tree parent folder is also "J", path_identification_string is "books/j/just_mary" so everything is fine
Second book:
in ezurlalias_ml, no new row has been added, parent is folder is still "R", url part is "Rust-Mary2"
in ezcontentobject_tree parent folder is changed to "J" and path_identification_string is "books/j/just_mary"

Analyzing what was going on

On the first import, everything goes as expected.

On the second import of the first edition, The import script moves the node first because the title has changed.
Because this move method doesn't update the ezurlalias_ml table but only ezcontentobject_tree table, at this point ezcontentobject_tree and ezurlalias_ml disagree on the path.

Then the object is published, the code tries to determine the url part to give to the node and to do so checks ezurlalias_ml for availability. The check is made based on the current parent in ezurlalias_ml which is the "R" folder so in this case it's accepted because it already owns it.

Now once it has decided on a path part it then calls an update on ezurlalias_ml but this time based on the proper full path (with "J" as parent).
Since J/Just-Mary was not used the code still goes through and accepts the update, the check was wrong but we're getting back on our feet because there's no conflict.

On the second import of the second edition, the check on the url part looks first for the availability of "Just-Mary" in the folder "R", following again the information contained in ezurlalias_ml.

Since the first book's active row is now "Just-Mary" but with "J" as the parent, "Just-Mary" with the parent "R" is free so the check gives a green light to the path part.
When it tries to update ezurlalias_ml, it now asks to use the part path "Just-Mary" with the parent "J" which is already taken by the previous book and causes the code to skip the update and return an error.

Because of that the object is now published with ezcontentobject_tree giving "J/Just-Mary" and ezurlalias_ml "R/Just-Mary".
This is really bad because even the script updateniceurls.php gets confused and fails to fix the problem, asking to update a path part that is still not available when the code checking the availability says it is.

There is a parameter when updating ezurlalias_ml with eZURLAliasML::storePath called $autoAdjustName that will make the update code try to adjust if a path conflict is detected when updating but as far as I could see there is no code in the kernel that sets this parameter to true.
One way I found to fix the existing urls is to update eZContentObjectTreeNode::updateSubTreePath() and change its call to eZURLAliasML::storePath so that $autoAdjustName is sent as true then run the updateniceurls.php script.

Lessons learned from this

eZContentObjectTreeNode:move is not meant to be called directly because it does not update everything, its only purpose is to update the ezcontentobject_tree table. You should always use the wrapper eZContentObjectTreeNodeOperations::move instead

When moving and renaming an object in the same transaction, the publish should happen before the move so that when the item is moved the name and therefore path part is up to date and eZPublish doesn't get confused when it tries to update the url alias.

This really is an edge case since in order to reproduce the bug you need to have two objects with the same name and in the same location and have both of these objects moved to a different location and their name changed

What the code looks like now

$theObject = eZContentObject::fetch( $objectId );
$objectVersion = $theObject->currentVersion();
$publish = eZOperationHandler::execute
(
    "content"
    , "publish"
    , array
    (
        "object_id" => $theObject->attribute( "id" )
        , "version" => $objectVersion->attribute( "version" )
    )
);if( $newParentNodeId != $theBookNode->attribute( 'main_node_id' ) )
{
    eZContentObjectTreeNodeOperations::move( $theBookNode->attribute( 'main_node_id' ), $newParentNodeId );
}