Do you eXist?

28

Author: Mark Alexander Bain

If you are planning to build an online knowledge base, XML
might be your best choice for a repository format, because of its ease
of development, its platform independence, and the fact that it is in an open and human friendly format. If you
use XML then eXist,
an open source XML database, may help you to do the job effectively.
With eXist you can build collections of XML documents, index them, and
retrieve data using the XQuery
language.

The software is the brainchild
of Wolfgang Meier, who started work on eXist in 2000 after reading some
articles on indexing methodologies for XML. The application is written
completely in Java (and is therefore suitable for Linux or Windows). It
has two parts. The main component is a server application which is used
for the import, indexing, and querying of data. This must run on a
Web server with a Servlet-engine. Fortunately eXist comes with the Jetty
Web server, but it will work just as happily with others, such as Apache Tomcat.
There is also a secondary client application used mainly for any additional indexing that may be required.

eXist’s Web site contains a list
of sites using the software, including:

If you have a look at any of
these sites, it will be obvious that each deals with data that can be
categorized in a hierarchical manner. The data structure must follow a
strict parent-child relationship. An example of this would be a
library: the library contains sections; the sections contain books; the
books contain chapters. You could never have a chapter belonging to
more than one book, or a book belonging to more than one section.
Another example is an organization such as a company — a director will
have a number of section heads, each section head will have a number of
managers, etc. At no point should any person have more than one person
in charge of them (well, it’s a nice theory anyway).

Installing
the software

The current, stable version of
eXist is 1.0beta2.
Installation is simple. Move to the directory where you downloaded
it and type:

java -jar
eXist-1.0b2-build-1107.jar

An installer application will start, and all that you have to do is to
follow the instructions. Typically the installer will place all of the
server and client software in the directory ~/eXist.

Now you’re ready to fire up the
server. Move to the directory in which the installer placed eXist, and run bin/startup.sh,
or for Windows (well, some people don’t know any better, poor things) use bin/startup.bin.

The application should now be
ready to use, so open up your favourite Web browser and go to
http://localhost:8080/exist/. On first view it will look like you’re
back at the eXist Web site, but all of the same functionality and
documentation, and the eXist admin pages, are now available on your
local server.

Just a note when you’re testing
this out: eXist hasn’t really
been designed as a desktop application for general use; in most cases
it would
be used on a dedicated Web server. I have found that both Jetty and
Tomcat tend to use a lot of memory — too much for either KDE or GNOME,
but workable with a lighter window manager such as FVWM2.

Setting
up an example database

Next, the most obvious thing to
do is to set up a database. The software comes complete with some
examples that you can install using the eXist Admin page. One of the
examples — the XML
Acronym Demystifier
— doesn’t
load, but even without this there is still enough to give you a good
idea of how you can use eXist.

The Library Search Example is a
good one to start with. Try it out, then have a look at the application
files that allow you to carry out the query
(~/eXist/webapp/xquery/biblio.xml and ~/eXist/webapp/xquery/biblio.xq)
to help you better understand the database’s operation.

By this stage you will probably
be ready to start using your own data. You must first put it into XML
format; have a look at the W3
School XML tutorial
if you want
to learn how. There are then two ways to make the data available within
eXist. You can use the admin page to load the data, or use the eXist
client (which you can start from ~/eXist/bin/client.sh). Using either
of these methods will cause eXist to load the data into the database
and index it. You can perform further indexing (should you require it
for any reason)
through the client application.

With the data imported into the
eXist database and indexed, it is ready for you perform queries on it.

XQuery

While relational databases use
Structured Query Language (SQL) to get information from their data
stores, XML databases use a different language called XQuery:

XQuery
example
SQL
example
for $x in doc("library.xml")/book
where $x/price>30
order by $x/price
return $x/title
select title
from library.books
where price>30
order by price

From these pieces of code you
can see that there are similarities, and that XQuery is easy to
understand, but there will be no cutting and pasting a query from a
relational database to eXist.

If you are new to XQuery, start
by referring to the eXist
XQuery page
or have a look at
the W3 Schools
XQuery tutorial
.

Third-party
tools

You may feel that you need a
little more than just the basic eXist, a number of third-party tools can help you with some common tasks. For instance, jEdit allows you to write and test code for XQuery, and Orbeon’s Open
Integration Suite
can help you build a complete Web application. A full, current list is
available from the eXist
third-party tools page
.

Other useful resources include
the eXist wiki
and the eXist
mailing list
.

There are other XML-based databases like eXist, including NeuroSys and Apache Xindice. Others that keep the hierarchical structure but use their own formats are mmDb (MoreMotion Hierarchical Database) and GT.M
(Greystone Technology M). If you don’t mind paying for a stable, commercial solution then have a look at Software AG’s Tamino.

In conclusion

XML is rapidly gaining popularity as an industry standard by both major and minor companies, as evidence by the British government publishing its guidance on XML Schemas and Standards with the aim that information should flow seamlessly across all
sectors, and between the government and people. XML is a true open
data format available to any platform and to any user. Whether you
decide that eXist is suitable for your next project, XML itself is
certainly a good starting point.