Explore the Semantic Web using Piggy Bank

54

Author: Nathan Willis

If you’ve ever wondered what all the excitement surrounding the Resource Description Framework (RDF) or the Semantic Web is about, then I have good news. You can explore both without leaving your Web browser, using Piggy Bank.

Piggy Bank is a Java-powered Firefox extension developed by MIT’s SIMILE project, a group that is working on a suite of Semantic Web applications and tools. Piggy Bank runs in the background while you browse the Web in your normal fashion. However, when you hit upon a page with an RDF link, a small “data coin” icon appears in the Firefox status bar. Click on it and Piggy Bank will import and parse the accompanying RDF data. You can search through it via Piggy Bank’s built-in browser interface, or save it for later use.

Searching the contents of a single page is not an innovation of note; a well-structured page gets no better just because it has an RDF feed attached. What makes the Semantic Web different is the notion that an application can aggregate data from many sources and then, by using RDF, map various data relationships.

For example, Piggy Bank’s user documentation shows how you can combine location data from classified ads on VacancyGuide.com, job listings from Monster.com, and movie theaters from Yahoo to see which apartments and houses on the market are closest to the jobs you want and the movies you want to see — a result not offered by any of the three original sites, but the kind of useful task that most people end up doing manually when house- or apartment-hunting.

This little piggy went to market

To get started with Piggy Bank, you will need a recent version of Firefox and a working Java plugin. When you visit the Piggy Bank project page page, it checks for the presence of a Java plugin in your browser. If it detects one, it displays an XPI install link in the upper right column. If it doesn’t find Java, you’ll see a “Get Java” link instead. Once you have Java set and have your install link, install the XPI and restart Firefox.

After the first restart, the Piggy Bank plugin will take a few moments to initialize. It will ask you to supply an email address, from which it will create a hash to designate the data you gather as yours. You’ll be able to share data collaboratively, as we will see in a moment. The email address itself is not sold or given to others, but if you’re slightly paranoid, as I am, you can use one from a free Web-mail service.

Now you can visit a site with an RDF feed and import the data out of it by clicking on the “data coin” status bar icon. After you import the data, you can either search through it with Piggy Bank’s built-in search tools, or save it and continue your browsing. By continuing to browse, you open up additional possibilities.

For example, choose the Track this Window option from the Tools -> Piggy Bank menu. After you have collected data from two or more sites, you can perform a cross-site aggregation, like the one described above. The Combine Information from Several Windows option opens a list of all the sites you are tracking. Select the ones you want to aggregate and Piggy Bank will present you with a search page that combines the data from the different sources. When the data includes geographic location (as in the prior example) you can map the results.

You can also assign multiple private tags to every item in your Piggy Bank data, then search and filter based on those tags.

Bringing home the bacon

Although the W3C’s Semantic Web initiative encourages this kind of RDF usage, it has yet to break through into the mainstream. Fortunately, you are not limited to RDF-supporting Web sites. Piggy Bank also supports pluggable screen scrapers — JavaScript or XSLT modules that parse non-RDF pages and extract their useful information.

Each screen scraper is designed to parse the data from a specific site. It registers the appropriate URL pattern and Piggy Bank calls it whenever Firefox visits a matching page. As with RDF-enabled sites, Piggy Bank notifies you with the data coin icon. However, in this instance, when you click on the coin the scraper reads and converts the page’s contents, rather than the so-called “pure” RDF extraction.

Therefore, your success hinges on the skill of the screen scraper’s author. If the site suddenly changes presentation formats, you will have to rely on the author to update his scraper accordingly. In such an instance, it’s a good idea to drop the author of the scraper a friendly note of thanks, while at the same time sending the site’s owners an email encouraging them to adopt RDF.

Alternatively, you can write your own scraper. Piggy Bank does not have an official distribution method for scrapers. Instead it links to those created by team members. However, it is left up to users to determine which sites are worth scraping. If you decide to undertake such a task, there are guides available to help you get started.

If you are not up to that, you can also find links to data from a Semantic Bank. SIMILE produces a server-side Piggy Bank companion by that name, through which users can share items they have collected. You can access these communal libraries from Piggy Bank’s My Bank Accounts menu.

You must register with each bank you want to use. The SIMILE project’s bank is the default option. However, you can also start your own. Once signed up you can contribute items as well. Each item published is marked with the unique, personalized hash that was generated back in the initial Piggy Bank installation, so you can filter on the shares contributed by specific individuals, yourself included.

Ham it up

Describing the Semantic Web is not easy. Sure, you can say that it is an initiative to enable cross-platform data exchange and reuse through well-defined ontologies and a common XML-based framework. Unfortunately, such a description sounds cryptic and perplexing.

Frankly, it is easier to demonstrate the Semantic Web than to describe it. Now anyone with a Firefox browser and Java can get started with the Piggy Bank plugin from SIMILE. I recently saw a blog refer to Piggy Bank as a “Google Maps Mash-up,” presumably due to the location-mapping feature. Although that is the most screenshot-friendly trick the plugin does, the power of the Semantic Web extends far beyond that.

Take it out for a spin. Once you start seeing RDF data on the sites you visit regularly, you will start seeing more and more potential connections between sites, even where the sites themselves don’t. That is what the Semantic Web is all about.