April 4, 2007

How to script songs lyrics retrieval

Author: Duane Odom

I recently wrote a simple bash script to incorporate a lyrics database into some of my music-handling scripts. I took advantage of one of the benefits of open source software by finding an existing application that performed this task and inspecting the code to see how the developers did it.

I started with the code for Rhythmbox, a music management application. I discovered that the developers used a couple of simple URL calls to a Web-based lyrics database called Leo's Lyrics:


http://api.leoslyrics.com/api_search.php?auth=duane&artist=cake&songtitle=comfort eagle
http://api.leoslyrics.com/api_lyrics.php?auth=duane&hid=VxwOBYpM3iY=

The first call returns an XML file containing the results of the search in a file like the one below. (Note: the authorization token [I use "duane"] seems to be ignored by the server. I tried unsuccessfully to find some documentation for the Web-based API on the Web site. I also tried sending an email to the support address on the site after signing up for an account that allows you to submit new lyrics, but I got no response.)

<?xml version="1.0" encoding="UTF-8"?>
<leoslyrics>
 <response code="0">SUCCESS</response>
 <searchResults>
   <result id="120741" hid="VxwOBYpM3iY=" exactMatch="true">
     <title>Comfort Eagle</title>
     <feat/>
     <artist>
       <name>Cake</name>
     </artist>
   </result>
 </searchResults>
</leoslyrics>

From the response element (<response code="0">SUCCESS</response>) I could see that the call had succeeded. The next thing I was interested in was the result element (<result id="120741" hid="VxwOBYpM3iY=" exactMatch="true">), and particularly in the hid attribute, which is the id of the lyric entry that I was interested in. I passed the hid that I gleaned to the second URL call, and got this result:

<?xml version="1.0" encoding="UTF-8"?>
<leoslyrics>
 <response code="0">SUCCESS</response>
 <lyric hid="VxwOBYpM3iY=" id="120741">
   <title>Comfort Eagle</title>
   <feat/>
   <artist>
     <name>Cake</name>
   </artist>
   <albums>
     <album>
       <name>Comfort Eagle</name>
       <imageUrl>http://images.amazon.com/images/P/B00005MCW5.01.MZZZZZZZ.jpg</imageUrl>
     </album>
   </albums>
   <writer/>
   <text>We are building a religion
We are building it bigger
.................
Pendant keychains</text>
 </lyric>
</leoslyrics>

From these results I saw that I could pull out the text element (<text>.../<text>) from the results and have my song lyrics.

To automate this process in a bash script I used wget and xmlstarlet. Wget is a simple utility for non-interactive download of files from the Internet. I used wget to call the URLs with the correct parameters and capture the XML results. Xmlstarlet is a set of command-line utilities used to query and process XML documents. I used xmlstarlet to pull out the pertinent information from the URL call results.

To make this script really useful, I wanted to supply it with a path to an MP3 file and have it pull the artist and title from the ID3 tag in the file and use this information to download the lyrics. I used id3tool, a utility that can view and edit ID3 tags within MP3 files from the command line.

After running id3tool at the start of the script, I use sed to pull the artist information from id3tool's output and store it in a shell variable called ARTIST. I retrieve the song title with a similar line.


ARTIST=`id3tool "$1" | sed -ne "s/.*Artist:\(.*\)/\1/p"`

I then use wget to call a URL with the artist and title parameters. The call returns XML results which are stored in a shell variable called search_results:


search_results=`wget -q "http://api.leoslyrics.com/api_search.php?auth=$AUTH&artist=$ARTIST&songtitle=$SONGTITLE" -O -`

Next, I use xmlstarlet to parse information from the XML results. The following example executes the sel command of the xmlstarlet utility to parse the text of the response element:


result=`echo $search_results | xmlstarlet sel -t -v "/leoslyrics/response/text()"`

Finally, I use a combination of the technique above with the unesc command of the xmlstarlet utility, which simply un-escapes all escaped characters in the text, making it easier to read:


echo $lyrics | xmlstarlet sel -t -v "/leoslyrics/lyric/text/text()" | xmlstarlet unesc > "$1.txt"

The completed script gets the lyrics for one song at a time. To download the lyrics for your entire song library, you can use the find command with the exec parameter to execute the script on all of your songs at once. For example, if your song library is rooted at /share/music/, you could run:


find /share/music/ -iname *.mp3 -exec get_lyrics.sh {} \;

You can easily use the same URL calls and response parsing techniques in your language of choice (providing that language can retrieve Web pages and parse XML). These techniques could also be adapted to work with other lyrics databases that support Web-based APIs.