Patterns of Behavior

You hear this all the time: “Just because you can do something doesn’t mean that you should.” Using the same tricks to solve all of your problems keeps you from finding better ways to accomplish those tasks. I’ll give you a technical example from my mad laboratory.
Pretty Pattern Pablo Fernández via Compfight
I’m creating a script for Retrievem that extracts words from lists based on patterns. Much of Retrievem’s power comes from something called regular expressions. The main selling point, extracting email addresses, is a single regular expression:

^[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,6}$


Even without knowing regular expressions, you might be able to pick out the @ sign and you can guess that the part before the sign is for the email name, while the part after is for the address. Neat, right?

Well, pattern matching is what regular expressions are all about. However, that may not be the best way to match words, depending on what the patterns are. One type of pattern might be, “Show me all six-letter words that begin and end with vowels.” This is easy enough to do with a regular expression:

^[aeiou][a-z]{4}[aeiou]$


You can see the vowels in the brackets on both ends of the expression. Depending on the source word list, Retrievem would return a list with words like ALKYNE, ARGYLE, ASSURE, EQUINE and ETHYNE.

But, what if the pattern is stricter? “Show me all six-letter words containing exactly two vowels, one at the beginning and one at the end.” Well, that’s still easy, if you replace that [a-z] bit with [^aeiou]. Now, Retrievem throws away everything except ALKYNE, ARGYLE and ETHYNE.
Another Pretty Pattern Pablo Fernández via Compfight
I wanted my script to be even more flexible. Imagine that you’re trying to solve a cryptogram like this:

From Puzzleland on Race2Hugo.net

You could ask Retrievem to find all words that match the pattern for XSHHJYU, which is the first word in the puzzle. First of all, the regular expression is slightly different*:

^([a-z])([a-z])([a-z])(\3)([a-z])([a-z])([a-z])$


Secondly, the regular expression doesn’t take the rule of cryptograms into account. Each letter in the code stands for only one letter in the answer. So, Retrievem will return useless suggestions like FREEZER and SETTLER, where the E‘s can’t fit. What I needed was a stricter pattern matcher that looks at the letters as well as their positions in each word.

I actually spent some time playing with regular expressions to solve this. The answer may be out there, but I decided to just add some code to Retrievem itself. Basically, it converts each word into a number in such a way that only words with the same numbers will satisfy the pattern. Simpler for me and the results are wonderful!

BATTLED, BATTLER, BEDDING, BELLOWS, BETTING, BHEESTY, BIGGEST, BILLOWY,
BULLACE, CADDISH, CAFFEIN, CALLING, CESSPIT, CHEERIO, CHEERLY, CHOOSER,
CIRRATE, CIRROSE, CIRROUS, CITTERN, CLOOTIE, COBBING, COBBLER, CODDING,
CODDLER, COFFRET, COLLAGE, CUTTING, CUTTLER, DABBLER, DAFFERY, DAFFING,
DAFFISH, DAGGERS, DALLIER, FADDISH, FADDISM, FADDIST, FALLING, FIBBERY,
FIDDLER, FISSURE, FISSURY, FITTAGE, FITTERS, FLOODER, FLOORED, FOGGILY,
FOGGISH, FOPPERY, FOPPISH, FOSSULA, FOSSULE, FREEDOM, FREEISH, FREESIA,
FUDDLER, FULLERY, FULLING, FULLISH, FUNNILY, FURRILY, FURRING, FURROWY,
FUSSILY, GALLEON, GARROTE, GOSSIPY, GREENLY, GROOVEY, HAPPIER, HAPPIFY,
HAPPILY, HELLBOX, HELLCAT, HELLDOG, HELLION, HERRING,…, WETTING, WETTISH,
WOBBLER, WOFFLER, WOPPISH, WORRIED, YAPPING, YELLOWS


Here are just a few of the 852 matches Retrievem found in my word file
* (Regex experts: the grouping of every letter is not necessary, but Retrievem needs to be able to use any position as a backreference and it was easier to just put each element into a group.)

This entry was posted in ParserMonster by Mitchell Allen. Bookmark the permalink.