Grep and Hypemachine

Tuesday, December 15, 2009


Technical writing ahead:

You'll notice that the track listings for each mix are now linked to The Hype Machine. This allows you to search an artist's catalog in-depth, and also to find songs on mixes that have been taken down.

Typing each link out by hand would have taken too long, so I turned to Grep, a pattern-matching utility for searching text.


Problem: Copying and pasting track listings from iTunes results in Track Name    Artist Name, leaving no clear separation between the track and the artist. The plain text format of the track listing isn't as helpful as it could be, but reformatting each listing and adding the hyperlink code by hand will take too long.

Goal: Automate a reformatting of the listings, so that they read Track Name — Artist Name, and link each artist and song to its respective Hype Machine page.


I'd never used Grep before, so it took some finangling, but I eventually worked out a find and replace algorithm I can apply to the track listing that converts each song and artist text to search links at hypem (I used the text-editor BBEdit, which has a Grep function). The algorithm looks like this:

FIND: (.+)\t(.+)$
REPLACE WITH: <a href=http://hypem.com/search/\1/>\1</a> — <a href=http://hypem.com/artist/\2/>\2</a>

Let's break it down:

First, the parentheses separate the search into subpatterns. This is useful because I can refer to them again when I replace the text. The pattern in the first parentheses can be called using \1; the pattern in the second with \2, and so forth.

The \t in between the two pairs of parentheses represents a tab. Luckily, iTunes inserts a tab character between the track and artist, but it's not enough of a visual separation.

. will match any character. Appending + will search for any number of characters. The string after the first pattern, \t, tells the search when to stop. Thus, a search for .+\t will match any number of characters until it reaches a tab.

For example, applied to the line Time of the Season     The Zombies, a search for .+\t will match Time of the Season.
Searching for .+\t.+$ will match Time of the Season     The Zombies. (The $ is a line return.)

So, in English, the search pattern (.+)\t(.+)$ finds any number of characters (\1), separated by a tab, then any number of characters (\2), until a line return.

Using the above example, \1 equals Time of the Season, and \2 is The Zombies. Because the artist and song name have been handily grouped into subpatterns, I can easily insert them into a URL. Thus, http://hypem.com/search/\1/ expands into http://hypem.com/search/Time%20of%20the%20Season/1/, and http://hypem.com/artist/\2/ becomes http://hypem.com/artist/The%20Zombies/.

(The Hype Machine uses slightly different URLs between general search and specific artist browsing.)

To sum up, when applied to the line Time of the Season     The Zombies, the algorithm
FIND: (.+)\t(.+)$ and
REPLACE WITH: <a href=http://hypem.com/search/\1/>\1</a> — <a href=http://hypem.com/artist/\2/>\2</a> results in the usable HTML code
<a href=http://hypem.com/search/Time%20of%20the%20Season/>Time of the Season</a> — <a href=http://hypem.com/artist/The%20Zombies/>The Zombies</a>, i.e.,

Time of the SeasonThe Zombies

Now, the boring old track text has been replaced by a much more helpful link to the ever-useful Hype Machine, and I avoided several hours of coding the html by hand (hours instead spent learning some basic Grep and typing this post).


Problems with special characters:
Because I use < and > to denote cover songs, and brackets to indicate editing, I will eventually need to write in a clause that ignores these special characters and the words within them — otherwise, there is a risk that hypem will get confused.

Also, Comcast's recent DNS hijacking was affecting my ability to search with straight spaces; I may need to replace the spaces with hard-coded %20s.