May 26, 2006

Docco is boffo for document search

Author: Dmitri Popov

If desktop search tools like Kat and Beagle are overkill for your needs, then try Docco, a little application designed specifically to index documents and search inside them.

Docco is based on Apache's indexing and search engine Lucene, and it requires a Java Runtime Environment (available free of charge at www.java.com). To install Docco, download the latest version of the software, unpack it, and move the resulting directory to the desired location. Docco indexes documents stored in plain text, HTML, XML, OpenOffice.org/StarOffice 6.0, Word, Excel, and PDF formats. However, if you plan to use Docco with Word, Excel, and PDF documents, you must install a couple of plugins which are also available at the download page. Installing the plugins is easy: download and unpack the plugin you want, then move the unpacked folder into Docco's plugins directory. Finally, launch Docco by running the run-docco.sh script (or run-docco.bat on Windows).

Before you can use Docco to perform searches, you must index the directories containing the documents. To do this, choose the Indexing -> Index directory command and point to the directory you want to index. You can have more than one index in Docco, and the indexes can include directories on mounted network drives. Indexing can be a time-consuming task that requires a lot of processing power. Fortunately, Docco contains an indexing priority feature (Indexing -> Indexing Priority), which allows you to control the indexing speed. For example, you can set the indexing priority to High during the night when you don't use your computer and switch to Medium or Low during the day.

Once Docco is done indexing the directories, it's ready to go. To search for a particular word or word combination, enter it into the Search field and press the Submit button. Unlike other search tools, Docco presents the search results both as a directory tree and as a diagram. The latter is a unique and useful feature that presents the search results as a "network" of connections between search words.

For example, the diagram in the figure presents results for three search words: "writer," "landscape," and "print." The nodes on the diagram represent each search word, and the numbers below the nodes indicate the number of found documents. Click on a node to view the direct connections to other search words and to display the related documents in the directory tree. Select the document you want in the directory tree to view its meta data, and press the "Open with default application" button to open the document for editing.

While Docco doesn't offer all the bells and whistles of tools like Beagle or Google Desktop Search, it can come in handy if you want to search your documents with minimum fuss and view results graphically.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, and Danish computer magazines.

Click Here!