August 7, 2008

Desktop search comparison: Beagle vs. Tracker, part 2

Author: Ben Martin

Yesterday I discussedBeagle and Tracker with regards to their preferences settings, the time to index a collection of both HTML and PDF files, and how to extract information from individual files. In this article I'll go over the interfaces used to submit queries and the syntax used for complex queries for both projects.

I tested each tool by doing searches on a couple of common terms. I was surprised at the different results I got searching for the word "ethernet." Although it is obviously only a single search, Tracker returned what might be the most relevant result as the top item. Beagle showed the networking HOWTO as the fifth result (shown as "Networking" in the results), with the HOWTO indexes scoring higher than the individual HOWTOs. The Ethernet HOWTO that Tracker offered as it's top hit is much more applicable to the search term.

The Beagle search interface only show the number of documents up to a maximum of 100 hits, while Tracker allows you to see the total number of matches and page through them all.

I also searched for "voip" to see if the Ethernet search was touching on some special case in Beagle. Again, Tracker showed the VOIP HOWTO PDF file as the number one result. Beagle presented the release notes and README for siproxd, followed by the networking HOWTO, part of the modem HOWTO, and two files from the Webcam HOWTO. The VOIP HOWTO in both HTML and PDF was on the second page of results in Beagle.

Tracker doesn't allow Boolean AND, OR, NOT operators but runs a ranked query using your terms. Beagle allows optional terms to be included in the search using the OR keyword, and you can exclude results that contain a term by prefixing it with the minus sign -- for example, Linux -troll.

Search interfaces

You can bring up the graphical search interface for Tracker shown in the screenshot below by executing tracker-search-tool. The window has no way for you to load or save complex queries. Right-clicking on a search result allows you to open the result, open the folder containing the result, move the result to the trashcan, or save the current search results. Saving the search results gives you a text file containing one file path per line. You might want to select how many of the results to save but you can't. The default is to save only the 10 results that the GUI displays. If you want to perform an action for every result, such as transferring each match to another machine, you cannot issue the search using the tracker GUI. Instead you must issue the query using the command line tracker-search command. Beagle has a similar command-line tool called beagle-query.

Clicking on one of the categories listed in the left side of the window narrows the search results to only items in that category. A category is shown in the list only if one or more search results are in that category. Tracker does not allow you to save the list of files that match just the category you're interested in. Given that Tracker places so much emphasis on making tags easy to use, I would also have expected its graphical search tool to have some support for searching with tags, apart from forming a query string manually. For example, once you have issued a search, you might like to see some sort of navigation based on all the tags that the resulting files have. This should let you narrow your search more efficiently (using the tags you have assigned) than selecting from broad categories.

The Tracker search interface, though simple, is effective if you want to issue queries by typing them. The downside is that you have to become familiar with the query syntax to be effective, rather than just clicking on a tag to add it to the query.

When viewing the results of a search, the metadata display for the current hit is a nice touch. For example, if there are two PDF files in the current 10 results, being able to see how many pages each PDF file has lets you open the main document rather than a smaller summary document. It might be nice to be able to sort the resulting matches by this metadata and others. For example, it would be nice to explicitly sort the results by size, by modification time, or number of pages.

Integrating tags into the search results is a great move by Tracker. Unfortunately, apart from adding and removing tags for individual results, there is no real help to let new users use the tags for searching. Sure, you can right-click on a tag once you have added it and select "Search for Tag" from the context menu, but this search is not limited to files that have that tag. For example, I added a tag "foo" to a file and started a search, and the results included many source files containing that string rather than just the files I had tagged with foo.

The beagle-search graphical interface is shown below. At first glance it appears simpler than the tracker interface because it lacks a categories tree running down the left side. You can filter the results to show only a nominated set of categories using the Show Categories submenu in the Search menu. This lets you select more than a single category for viewing at one time, instead of the single category that Tracker allows. The search menu also allow you to run a search against a copy of Beagle running on another machine on the network. The View menu lets you sort results by modification time, file name, or by relevance, and hide the file details pane at the bottom of the window.

Right-clicking a result brings up a context menu that allows you to open the file with its default application or select an application to use from a list. You can also "Reveal in Folder," which seems to be a roundabout way of saying it opens the folder containing that match. The final options in the context menu are to email the file to somebody and to move it to the trash.

Beagle's ability to sort the results is a handy option, though it is arbitrarily limited to sorting by modification time, name, or relevance rather than other metadata fields. The move to having category selection being set-based (thus allowing you to select a number of categories at once) in the Beagle search tool is nice, though showing that sort of interface as a side panel that users could show or hide would let you select multiple categories without having to select None, then Images, then Media for example. Three trips to the menu to pick two categories gets old quickly.

Query syntax and semantics

The community has been working to create and use the Xesam standard for describing desktop searches, which includes a terse search syntax designed to let users directly input queries. As Xesam developers are meeting with desktop search developers at the first Desktop Search Hackfest
in late September, hopefully the Xesam standard will see wider adoption. Beagle has support for Xesam through the beagle-xesam project. The Tracker project is also moving to adopt the Xesam standards. The native query syntax for Beagle is well documented. Tracker supports the RDF Query Specification rather than the more terse SPARQL language for querying RDF. There are a few simple examples of using the RDF Query language with Tracker in the rdf-query-examples subdirectory of the Tracker distribution tarball. I could not find information on what query syntax the Tracker graphical search tool expected or allowed.

Wrap up

The Beagle native query syntax has a few rough edges, such as requiring dates to be input in a rigid format rather than trying to parse more human-readable dates into values. However, its property query syntax (eg. artist:Beatles) is simple to learn, though it has the limitation as described of being used only for equality searches, leaving you out in the cold if you want to search on numeric metadata.

Tracker's use of the older RDF Query language instead of the SPARQL language may put some off. The main drawback for Tracker is a lack of information on what the current query syntax and semantics are. Inputting a collection of words or the string "*.pdf" is all well and good, but it is nice to be able to refer to documentation when you want to form a more precise query that is not complex enough that you want to write an RDF Query to express it.

Beagle has ability to build shared static indexes on common directories such as /usr/local/doc and search them as well as your home directory, along with the ability to search with other instances of Beagle over the intranet (file servers). The Beagle Web site is also more developed than Tracker's, in particular because the query syntax is available. However, of the two, I'm inclined to recommend Tracker. Tracker gave better results for the simple queries that I used, which is the primary goal of these projects. Tracker would go a long way toward relieving its major Achilles' heal if the shorthand query syntax were documented on its Web site along with some examples, as the Beagle site offers.

As a disclaimer, I work on a "competing" open source metadata extraction and indexing project: libferris. I am unaffiliated with either Tracker or Beagle and considered them in an unbiased manner.


  • Reviews
  • Tools & Utilities
  • Desktop Software
Click Here!