November 19, 2010

Weekend Project: Add PubSubHubbub Syndication to Your Site or Blog


Web syndication formats like RSS and Atom have revolutionized the way we get information, but even though these formats are standardized and widespread, syndication is still evolving. PubSubHubbub is one of the more interesting advancements in syndication technology, because it actually changes the update method used by clients, from traffic-heavy periodic polling to a publisher-push model. Luckily you don't have to choose one or the other; you can add PubSubHubbub support to your site's feeds without losing compatibility with your more "traditional" readers. This weekend, update your feeds to the latest and greatest in syndication and generate your own hubbub.



How It Works

In Ye Olde-Fashioned Web Syndication of yesteryear, there were exactly two players in each conversation: the publisher (such as a blog or other site producing RSS or Atom content) and the subscriber (an end-user application, such as a feed-aggregating blog reader like Google Reader or TT-RSS, or a stand-alone software app). The publisher updated the content of its feed whenever a new story or post went live.

To get the new content, however, the subscriber application had to "poll" the feed, querying to see if any new items had been published. Aggregators that subscribed to loads of different feeds therefore had to do lots of polling, and publishers had to respond to loads of queries even when there was nothing new.

PubSubHubbub adds a new intermediary to the mix, the hub, that essentially keeps track of all of that publisher/subscriber state, and reduces the workload on both the publisher and the subscriber. There have been intermediary "feed service providers" in the past, such as Feedburner, and, in practice, hubs are such a service. The publisher registers its feed with with the hub, and sends updates notices to the hub whenever new content is available. The publisher's feed is the same, except for an extra <link> element with the rel="hub" attribute pointing to its hub of choice.

The subscriber subscribes to the feed just like it would any traditional RSS or Atom feed, but it notices the "hub" link embedded in the content. It, too, registers with the hub, and in the future it skips polling that particular feed. Instead, whenever a hub gets an update, the hub forwards the new content to the subscriber immediately. Not only is there less work for both the publisher and subscriber, but the updates are syndicated in near real-time.

It may seem odd to expect better performance by introducing a middleman into the process, but the bandwidth savings come from having the hub efficiently aggregate lots of publishers and subscribers, and eliminate the need to do polling altogether.

From the software side, then, there are three components to consider: adding hub support to the feeds that you already publish, running a hub yourself, and using a PubSubHubbub-aware client application to read. Anyone can run a hub, and there is nothing to prevent you from running a hub and publishing a feed at the same time; you have to administer the hub server, but you will still see overall traffic savings.

Pub: Publishing Tools

Thanks to the reduced loads seen in PubSubHubbub, many commercial feed-publishing services already support publishing a PubSubHubbub-compliant feed. These include, Status.Net, LiveJournal, MySpace, Tumblr, Posterous, and many of the Google-owned properties (Google Code, YouTube, Picasa, and Blogger — more are rolling out regularly). If you use any of these services personally, you probably do not have to do anything to begin using PubSubHubbub, except perhaps select the hub you wish to use.

For those sites that you run, becoming a PubSubHubbub publisher entails two pieces: making sure that whatever package or content-management system (CMS) produces your feeds correctly tags your Atom or RSS content with the <link rel="hub" ...> element, and making sure that it sends update messages or "pings" to the hub whenever there is new content. Most CMSes and plugins support both, but there are ways to implement them separately if necessary.

There are two separate Publisher plugins for Wordpress, named PubSubHubbub and WP Pubsubhubbub. Both of them support pinging multiple hubs, and tag all of the feeds and feed formats that Wordpress produces (including comment feeds and multiple versions of RSS, for example). The plugin named simply PubSubHubbub is slightly newer, and is compatible with Wordpress releases up to 2.9.2. WP Pubsubhubbub support up to Wordpress 2.8.4.

The latest version of the Drupal CMS supports PubSubHubbub out-of-the-box, as part of the "Feeds" module. There is also a third-party plugin named fastwebfeed that may interest Drupal users running older releases.

Movable Type requires a plugin, developed by a third-party. Interestingly enough, the open-source derivative of Movable Type named Melody has folded the same plugin into its core, so it can serve as a publisher automatically.

Zend, too, has a plugin that enables publishing and pinging.

In a slightly different vein, the Venus blog aggregator that is used for producing many public "planet" sites has also added Publisher support in recent builds.

Outside of the pre-packaged CMSes, most custom or home-brewed sites can add PubSubHubbub publishing support with an appropriate feed library. The Google Code project wiki maintains a fairly-complete list — there are well-supported packages for Perl, PHP, Ruby, Java, and Python, among other options. Finally, if all else fails and you must manually insert the <link rel="hub" ...> element into your feed template, you can manually ping the hub with a small bookmarklet, provided by the Google Code project.

Hub: Publicly-Available Hubs, and Running Your Own

It is slightly inconvenient that the creators of the protocol assembled its name out-of-order: the Pub comes first, followed by the Hub, and then the Sub, pulling up the rear (the "bub" was presumably chosen just to increase the tongue-twister factor...). Thus, once you have your site capable of acing as a publisher, you need to either select or set up a hub to collect the pings, grab updates from your feed, and relay them to subscribers.

There are three major "public" hubs at the moment, open to all content. The most well-known is the "reference implementation" run by Google at Just go to the publish link and add your feed details.

Following close behind Google's public hub is the Superfeedr hub, which was the first non-Google hub to launch. It does require setting up a Superfeedr account in order to use, however. Last but certainly not least, Feedburner (which is now owned by Google) offers Hub support to its feed-publishing users as well. Instructions are on the site.

For those wishing to run their own hub, hub plugins exist for both Wordpress and Django. This Wordpress plugin, PuSHPress, works as both a publisher and a hub, so installing it negates the need to also install either one of the previously-discussed Wordpress plugins. With PuSHPress, your Wordpress site will directly send real-time updates to your PubSubHubbub subscribers. The Django plugin is not quite as integrated; you manually send the updates from the Hub component whenever you wish to publish. Note also that the Django plugin supports only Atom feeds.

There are also Hub libraries for various Web development languages, including Ruby, Perl, and Python. Again, the best list is maintained on the PubSubHubbub wiki.

Sub: Why Feed if You Can't Read?

Finally, there would be little use to supporting PubSubHubbub if there were not feed-reading clients out there that could understand it. While not all feed-consuming applications are yet on board, more and more clients are available. You can help speed adoption of the protocol by using one of them for your own feed-reading. Thus far, Google Reader, Google Buzz, Netvibes, and Friendfeed are the main PubSubHubbub-subscribing "end user client" applications. Google Reader, of course, is the only one of that bunch that is designed from the ground-up to be a news reader; the rest are focused on more diverse sources of Atom or RSS content.

PubSubHubbub support tickets have been filed for a few open source feed readers, including Tiny Tiny RSS and Akgregator. While you wait, though, there are some interesting open source tools that you can use to get access to PubSubHubbub feeds. Mihai Parparita's PuSH Bot is a PubSubHubbub-consuming XMPP gateway; you can use it to watch live updates from a PubSubHubbub feed in your instant messaging client. Brian Stoner's Tornado CouchDB Aggregator consumes PubSubHubbub feeds and streams them live to a Web page.

For developers, there are PubSubHubbub-subscribing client libraries for several CMS frameworks, from Drupal to Django to Rails, with which you could build a Web-front-end feed reader. As you might expect, there are plenty of lower level libraries as well, for Perl, Python, Ruby, and many others.

PubSubHubbub has really only been available in the wild for a year, and without doubt the fact that it is backed by Google has contributed a lot to its spread. But that is not all bad; Google can easily stress-test the system on its high-traffic services like Google Reader and Blogger.

As for the rest of us, you will probably see some bandwidth reductions by enabling support as a publisher on your own feeds. At the moment, running your own hub involves some overhead, so working with one of the existing public hubs is perhaps a safer bet. The best part, though, it that you can easily and cheaply experiment. PubSubHubbub may evolve or be replaced, but it costs you nothing to help find a better solution to the ongoing Web syndication discussion.

Click Here!