January 18, 2005

Building a command-line generator for RSS feeds

Author: Marcelo Rinesi

While there's no shortage of Web-based programs that output RSS, sometimes all you want to do is generate and serve a quick-and-dirty feed, perhaps as part of a script, or using the output of some other program. You might want to check your log summaries with the same ease than you check your daily news, or perhaps set up an RSS feed to distribute internal documents to your workgroup. While these are things that could be done with a content management system, they really call for nothing more that a simple command-line program able to create a feed and add content to it. Luckily, kludging together a reusable tool to generate RSS feeds is a relatively straightforward proposition, and one that opens myriad possibilities for creative uses of RSS.

I recently had the opportunity to build such a tool. Part of my daily routine involves looking at the result of a number of programs, including log summaries from a few servers servers and the output of long-running simulations. To cover them all I had to check email, intranet Web pages, and log files. While this routine was much easier than it used to be, it was still more time-consuming and error-prone than it could be. Putting all of these "internal news feeds" together with the other information feeds that I follow seemed like a logical way to simplify my routine and gain some extra time in the mornings. I just needed some flexible way of letting almost any program generate an RSS feed.

You may want to build such a tool from scratch, tempted by the apparent straightforwardness of the RSS format. Don't. Just as with parsing and generating XML in general, generating RSS is one of those tasks that looks easy only until you get to know what you are doing. It's almost trivial to write a program that will pass a cursory test, only to fail when facing one nasty detail or another. You'll end up writing an incomplete, buggy XML generator, a task made more difficult by the fact that you won't be consciously doing it.

Far better to take the open source road and reuse. Googling got me to Jonathan 'Wolf' Rentzsch's rsspipe, a Python script that reads stdin and outputs the last 100 lines as an RSS 0.92 file, with one RSS item per line. While this gave me more or less what I wanted, it was a bit too specific, and it generated the feed by hand. I wanted something somewhat more compliant, so I stopped searching and decided to write one myself. (Rationally, I should have kept looking, as I'm sure there are half a dozen such tools on the Web and a hundred more lying around in people's machines, but if we all did the reasonable thing, who would write these articles?)

I decided to use Mark Nottingham's RSS.py as my RSS generator, inspired by Ori Peleg's Python Cookbook recipe for publishing CVS commits to RSS. This module implements, among other things, a class that represents an RSS channel, with methods to parse an existing feed, alter it, and serialize it to a string. As it's based on PyXML, it actually goes to a lot of trouble to make sure that the generated RSS complies with many details you might not want to care about while implementing a quick script.

After downloading the module and making sure I had a recent PyXML install in the machine I was using, the script itself, which I named "tofeed," took about 62 lines to write, most of them dedicated to parsing and dealing with the different command line options. The final script, called tofeed.py, while a bit unpolished, does have a certain "Unixy" feel to it usage.

Adding an item to an existing feed is of course trivial:

$ tofeed "Item title" "Item content" "Item URL" feed.xml

It can also take the content from a file:

$ tofeed "Item title" --descfile=text.txt "Item URL" feed.xml

or from standard input:

$ tofeed "Item title" --descfile="Item URL" feed.xml

It can also dump the feed to standard output, and as it creates the feed if it doesn't exist, it has options to set the feed's title, link, and description.

How to serve it

There is no necessary connection between RSS files and the HTTP protocol; with tools like this one, it will often suffice to set the feed reader to directly read the file. On the other hand, it does feel a bit silly to build an RSS feed and not have a way of accessing it over the Net, even if a full-blown Apache installation might be overkill.

There are many lightweight HTTP servers available for Linux. I used lighttpd, which worked well enough. With a quick and painless install from source and a minimal configuration file like the one shown below, I could immediately start serving files. This breed of servers is surprisingly powerful, and especially shines in its low resource footprint, which means that you can use it to serve files (like RSS feeds) in machines that need to reserve as much processing power as possible for other tasks.

# lighttpd configuration file.
# You can save this file anywhere, e.g. as $HOME/.lighttpd.conf , and pass its location
# to the server using the command line switch '-f $HOME/.lighttpd.conf'

# Document Root server.document-root = "/home/mrinesi/tmp/feeds/"
# TCP Port server.port = 8080 

Now that you built it

The point of a tool like the one we just built, of course, is that while it does just one or two things, it can be used for an almost limitless range of tasks, most of them ideas you wouldn't have considered if you didn't have the tool at hand.

One of the first things I did with it was to modify my crontab file to pipe the output of all commands to a feed, like this:

0 5 * * * backup | tofeed --descfile=- "Backup output" "" /home/mrinesi/tmp/feeds/crontab.xml

Getting these reports in my feed reader instead of my email seems like a more natural fit. Summaries of server logs, as well as other statistics I need to keep an eye on, have also their own feeds. More and more, I'm starting to expect only messages from humans or time-sensitive things to show up in my email inbox, with everything else landing in a feed.

I can also now pipe the output of long-running programs or text files to a feed on an ad-hoc basis. This simple ability has changed the way I see RSS, and indeed how I use my feed reader. With the difficulty of putting things on a feed so greatly reduced, I've found myself using it more and more to leave myself short notes, to store away emails from mailing lists to read later or to disseminate information among a project group without necessarily going through a blogging interface.

All of these uses just scratch the surface, and doubtlessly, with more development time and effort, you could improve my script in terms of speed and features (that is, if you can't find an alternative one on the Net). The moral of the story is that RSS has long passed the level of library support that would entitle it to be an everyday technology for the average command-line user. Build yourself a simple command-line tool, play a little bit with it, and soon you'll find yourself wondering why you ever thought RSS was only for keeping up with blogs.

Click Here!