Rick's Tech Talk

All Tech Talk, Most of the Time

Regular Expression Dialects

Over the past few weeks I had some simple text transformations that had me reaching for Perl, but ended up doing directly in Vim. These were very small exercises in Regular Expression (RE) matching/replacing, and what struck me was how small differences in dialect can sometimes prevent you from making progress.

Here's the first example:

01/17   Led Zeppelin's first album is released, 1969
01/19   Janis Joplin is born in Port Arthur, Texas, 1943
00000122 Sam Cooke is born in Chicago, 1935

These lines are in calendar format. On a lark, I wanted to switch to pal, another calendar program, which required these strings:

00000117 Led Zeppelin's first album is released, 1969
00000119 Janis Joplin born in Port Arthur, Texas, 1943
00000122 Sam Cooke is born in Chicago, 1935

If it were just these three lines, then I'd be in the editor, hand-updating the strings manually (VIM: f/x0i0000 on each line). However, there are 195 of these lines, so I knew I'd be doing this with programming. Normally I use Perl, doing something like this:

% perl -p -e 's/(\d\d)\/(\d\d)(.*)/0000\1\2\3/' < sample
00000117   Led Zeppelin's first album is released, 1969
00000119   Janis Joplin is born in Port Arthur, Texas, 1943
00000122   Sam Cooke is born in Chicago, 1935

But I was looking at the file in VIM, and I knew VIM had the same capability to do search and replace with regular expressions. Unfortunately, entering the above RE in VIM's command-line mode produces a big fat "pattern not found" error.

Pattern Not Found in VIM

Here's where habits can sometimes prevent you from learning something new. It would have been easy to "just do it the old way." But learning new things is the essence of keeping your skills sharp. And for me, it's not a new skill either: I know regular expressions. It was a matter of using them inside of a new tool.

So I stared at VIm's pattern.txt help file. Eventually, I found my issue. In VIM, the grouping function is done with a "\(\)", not with unadorned parentheses "()". Backslash! Entering the following into VIM produced the desired results:

%s/^\(\d\d\)\/\(\d\d\)\(.*\)/0000\1\2\3/

It may seem like a small thing, but learning this is helpful because VIM has pattern highlighting. I can enter the first part of the RE as a search expression in the command line (\d\d\/\d\d) and have the pattern highlighted:

Highlighting Patterns in VIM

This makes VIM a simple RE tester. (Yes, I know there are RE testers on the web, but I've shied away from them.) Moreover, Perl can't give me highlighting, unless I decide to settle for just print the match (with \1).

My second example is slightly more involved. My file contains lines like this:

1.1.8 Crawl audio/video
1.1.9 Federated search (combining inventory results)
1.1.10 Local Search Results.

I needed these lines to look like this:

Crawl audio/video 1.1.8 
Federated search (combining inventory results) 1.1.9 
Local Search Results. 1.1.10 

Using my new found knowledge with VIM, I can test my matching RE (1\.1\.\d\d\?):

Matching Numbers in VIM

Then I group these and do a swap of the grouped expressions:

%s/\(\d\.\d\.\d\d\?\) \(.*$\)/\2 \1/
Doing A Simple String Swap in VIM

So what have we learned? Regular expressions have dialects, and sometimes those dialects can prevent us (at least me) from trying something new. But if you do persist and learn those dialects, you can add to your skills. That's a good thing.

Mastering Regular Expressions by Jeffrey Friedl gets into RE dialects, as well as the website regular-expressions.info.