Mugo Web main content.

Automating “curly” quotes in rich text fields

By: Dave Fearnley | May 1, 2019 | eZ Publish add-ons and eZ Publish development tips

One of our clients recently came to us with an interesting problem. When end users type content into a rich text field, double quotes, single quotes and apostrophes are not “smart.” That is, the quotes and apostrophes are straight instead of curly — typographically speaking, they are inch and feet characters.

An editor might type this:

  • J. R. R. Tolkien wrote "The Lord of the Rings," a trilogy of novels that serve as a sequel to Tolkien's "The Hobbit."

But really want this:

  • J. R. R. Tolkien wrote “The Lord of the Rings,” a trilogy of novels that serve as a sequel to Tolkien’s “The Hobbit.”

Here's how we created a script to automate search and replace for curly quotes and implemented an easy-to-use button in the client’s CMS to run the task.

Standard keyboard shortcuts let users insert curly quotes, but using them can be cumbersome. Our client’s CMS, eZ Publish, has a button to insert special characters, but this is slow as you can insert only one quote at a time. And what if you are copying some text from another source that erroneously uses straight quotes? Manually replacing each character would be a huge pain.

Which method is best?

Let's try and figure out an effective way to fix all of these quotes. A button would be nice. No muss no fuss. Just click and all the quotes are converted. But how am I going to approach the actual replacement? One way might be to try and match up pairs of quotes, but that can be open to a lot of permutations and thinking about that algorithm makes my head hurt. Any method is going to have to assume some things, but the less the better.

The other approach is the one I chose to follow  – assess which straight quotes are opening quotes and replace all of those with the left curly quote. Then do another pass and replace everything left over with the right curly quote. I’m handling the double quotes and single quotes independently. So I’m going to do a total of four passes through the text.

 

Respecting the source of a rich text field

I did some searching to find the regular expression I needed and found some suggestions. However, I have to understand that the source content in an eZ Publish rich text field, technically called an XML block, can contain quoted HTML attributes and JavaScript with perhaps single and/or double quoted string literals. This source code is different from the display text that the editor is working with. Converting the source code will break the rich text field. I have to be aware of this, and most of the research I found did not address this issue.

Ignoring HTML attributes and script blocks

First, I need to ignore anything in a script block and quotes in HTML attributes. I'll start with looking for script open:

(?!<script[^>]*?>)

and script close:

(?![^<]*?<\/script>|$)

I need to ignore anything between <> to exclude HTML attributes for our ‘attribute’:

(?!([^<])*?>)

Let's put those together:

(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)

The above will be added to every pass through the content so I don't corrupt any HTML attributes or scripts.

First pass: Opening single quote

I’m going to start by looking for opening single quotes. For this I need to make some assumptions about where I might find these. Here’s what I came up with:

  1. A single quote at the beginning of a line
    • ^'
  2. A single quote following a space, the greater than sign or any opening bracket:
    • The cow goes 'moo' or <p>'hello'</p> or ('hello')
    • [\s{\[\(>]'
  3. A single quote following a double quote which is preceded by a space or greater than sign:
    •  "'hello'" or <p>"'hello'"</p>
    • [\s>]"'

How do all of the above rules stitch together? Rule 1 can be applied on its own, but then we need to combine 2 and 3 with our script and attribute rules:

  • (1) or ( ((2) or (3)) and not (attribute) and not between (script start) and (script end) )

Giving me

  • ^' ) | ( ( ( [\s{\[\(>]' ) | ( [\s>]"' ) )( ?!([^<])*?> )( ?!<script[^>]*?> )( ?![^<]*?<\/script>|$ ) )

Let's tighten that up for my JavaScript replace function:

/(^')|((([\s{\[\(>]')|([\s>]"'))(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$))/gi

Different languages handle regular expressions in subtly different ways. I’m using JavaScript to do the search and replace, so I have to do some fancy dancing due to JavaScript’s regular expression and replacing anomalies. I want to take the result of my match and then replace the ' with a single curly left quote ‘. In other words, if I find >"' I want to replace the ' in that whole expression leaving me with >"‘. In PHP, you can do that within a regular expression, but with JavaScript you need to use a function:

function ($1$2)
{
    return $1$2.replace(/'/g,"\u2018");
}

The resulting replace function looks like this:

str = str.replace(/(^')|((([\s{\[\(>]')|([\s>]"'))(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$))/gi,
    function ($1$2)
    {
        return $1$2.replace(/'/g,"\u2018");
    }
);

Second pass: Closing single quotes and apostrophes

Now all of the opening single quotes are changed to the left single curly quote. Logically, all of the other single quotes can be changed to the right curly single quote. If I've done my first replace correctly, apostrophes will not have been converted in the first pass, leaving them to be converted to right curly single quotes in this pass. The replace is much simpler here. Note that I don't need a function in this case because I am not looking for character combinations, but a single character. I still have to avoid the script blocks and HTML attributes:

  • ' and not (attribute) and not between (script start) and (script end

My replace function looks like this

str = str.replace(/'(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/gi, "\u2019");

Third pass: Opening double quotes

Looking for opening double quotes is very similar to looking for the opening single quotes. The one caveat is that for nested quotes, the opening quote will be curly instead of straight:

  • ‘"hello"’

I can reuse the single quote match and make some small adjustments. I'm looking for double quotes now, so I need to replace the straight single quote in my regular expression with straight double quotes. More subtly, I've changed the nested quote lookup from [\s>]"' to [\s>]‘". I'm using the literal ‘ in the match because I couldn't get it to work with character codes. Finally, my function needs to replace the straight double quote in what I find with the left curly double quote:

str = str.replace(/(^")|((([\s{\[\(>]")|([\s>]‘"))(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$))/gi,
    function($1$2)
    {
        return $1$2.replace( '"', "\u201c" );
    }
);

Final pass: Closing double quotes

Again, this is very close to the replace function for the closing single quotes. I do have to make two small changes:

  • Replace ' with ".
  • Change the character that I am substituting to the right double quote.
str = str.replace(/"(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/gi, "\u201d");

Putting it all together in eZ

Now that I have solved the complex replace workflow, I can attach the functionality to the TinyMCE UI in eZ Publish.

I need to make some additions to ezoe.ini.append.php. Add the plugin reference:

[EditorSettings]
Plugins=[]
...
Plugins=[fix_quotes]
...

In the same file, add the button to any of the layouts you want to include the functionality on the toolbar:

[Layout_1]
Buttons[]
...
Buttons[fix_quotes]
...
[Layout_2]
Buttons[]
...
Buttons[fix_quotes]
...

In our eZ installation, the icons for the TinyMCE toolbar button are contained in a sprite. I found an appropriate icon for my curly quote functionality within this sprite. The toolbar icons are assigned to their respective button backgrounds in extension/ezoe/design/standard/stylesheets/skins/o2k7/ui.css. It’s bad form to update a TinyMCE file that could be changed via an update, so I added the the style to an existing stylesheet being loaded by the editor:

span.mceIcon.mce_fix_quotes {background-position:-220px 0}

Now we attach our curly quote code to the button via a plugin as per tinyMCE specs. Place a JavaScript file in an appropriate folder: /javascript/plugins/fix_quotes/editor_plugin.js

(function() {
    tinymce.create('tinymce.plugins.fix_quotes', {
        
         /**
         * Initializes the plugin
         * @param {tinymce.Editor} ed Editor instance that the plugin is initialized in.
         * @param {string} url Absolute URL to where the plugin is located.
         */

        init : function( ed, url ) {
            ed.addCommand( 'mceButtonFixQuotes', function() {
             
                //Read the content from the element
                var str = ed.getContent( {format : 'raw', no_events : 1} );

                //open singles
                str = str.replace( /(^')|((([\s{\[\(>]')|([\s>]"'))(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$))/gi,
                    function ( $1$2 )
                    {
                        return $1$2.replace( /'/g,"\u2018" );
                    }
                );

                // closing singles + apostrophes
                str = str.replace( /'(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/gi, "\u2019" );

                //open doubles
                str = str.replace( /(^")|((([\s{\[\(>]")|([\s>]'"))(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$))/gi,
                    function( $1$2 )
                    {
                        return $1$2.replace( '"', "\u201c" );
                    }
                );

                //closing doubles
                str = str.replace( /"(?!([^<])*?>)(?!<script[^>]*?>)(?![^<]*?<\/script>|$)/g, "\u201d" );

                //update the element content
                ed.execCommand( 'mceSetContent', false, str );
        });

        // Register button and add command function
        ed.addButton( 'fix_quotes', {
            title : 'Convert quotes to curly quotes',
            cmd : 'mceButtonFixQuotes'} );
        },

        /**
         * Returns information about the plugin as a name/value array.
         * @return {Object} Name/value array containing information about the plugin.
         */

        getInfo : function() {
            return {
                longname : 'Fix quotes button',
                author : 'Mugo Web',
                authorurl : 'https://www.mugo.ca',
                infourl : 'https://www.mugo.ca',
                version : 0.1
            };
        }
    });

    // Register plugin
    tinymce.PluginManager.add( 'fix_quotes', tinymce.plugins.fix_quotes );
})();

And there you have it! Let’s see it in action:

Before and after example of single and double quotes in text being converted to smart curly quotes

A little effort for a big payoff

Implementing user-friendly features like a curly quote search-and-replace feature is a great way to extend eZ Publish's functionality and make end-users a little more productive. And that’s always a win.

 

loading form....
Thanks for reaching out. We will get back to you right away.