August 26, 2004

XML-based documentation using AurigaDoc

Author: Scott Nesbitt

One XML's many uses is creating documentation. XML is incredibly well suited to producing output in multiple formats -- everything from HTML to PDF to JavaHelp. The most popular variant of XML for creating documentation is DocBook, but for all its power and flexibility, DocBook is difficult to learn, and the DocBook toolchain can be tough to set up. Consider instead AurigaDoc, an XML-based document engine that can output more than 10 commonly used document formats.
Unlike DocBook, AurigaDoc is simple to learn; it uses a small number of
tags, and yet produces clean and attractive output.

Like many open source projects, AurigaDoc was designed to "scratch an itch." "We needed a documentation system for our internal use that would be able to
generate output in a variety of formats," said Khurshidali Shaikh of AurigaLogic, developer of AurigaDoc. "We looked at some tools but they were very cryptic and difficult to use. At that time DocBook was not known to us."

Instead, the folks at AurigaLogic came up with a predefined XML format and a set of XSL stylesheets to convert the XML to HTML, PDF, and PostScript. The list of output formats has expanded to include:

  • HTML, either single or multiple pages
  • Dynamic HTML, with a collapsible menu
  • Adobe Portable Document Format (PDF)
  • PostScript
  • Rich Text Format (RTF)
  • Microsoft HTML Help
  • JavaHelp
  • Oracle Help for Java
  • man pages
  • MIME multipart/related message (MHT), a format that encompasses the HTML and the image and CSS files into a single file

According to Shaikh, the support for other formats was a fairly organic process. "We came across JFor and decided to integrate it to generate RTF output. We investigated further to see what the popular help formats were and came across JavaHelp and CHM (Windows HTML Help).

AurigaDoc is built upon several open source tools -- the Xerces and Xalan XML parsers, FOP (for conversion to PDF and PostScript), JFOR (for XML to RTF conversions), and the CSS2 parser from SteadyState Software.

These tools are written in Java, and as a result all processing is done at the command line. The command line syntax is simple, and looks something like this:

aurigadoc.sh -outputType -XML filename.xml -OUT filename

The whole process is quite straightforward. The only wildcard in the process is the -outputType option, which determines which processor to use -- for example, -rtf produces a Rich Text Format file. Other than that, converting an AurigaDoc file to another format is simple.

Working with AurigaDoc

The XML files that AurigaDoc processes aren't pure XML. They're a mix of XML and HTML. The XML elements define the structure of the document and its sections, and HTML tags are used to format the actual text. A section in an AurigaDoc source file looks something like this:


<section name="abstract" label="Abstract">
<br /><br />
In Linux, the most widely-used tool to create PDF files from PostScript ps2pdf.
While ps2pdf is simple to use, it has a number of parameters that many people either do not use or do not know about. This guide shows you how to create PDF
files using ps2pdf -- the basics, and how to use the parameters to create better
PDFs. Although this guide is aimed at Linux users, the information presented
here can be used with versions of ps2pdf that run on other operating systems.

</section>

Being an avid DocBook XML user, I found the use of hybrid markup to be a tad illogical when I started working with AurigaDoc. I wondered, "Why not use pure XML?" But, as Shaikh explained, "while developing AurigaDoc, we thought that it would be easier if writers didn't have to learn a new syntax. So we decided to use a mix of HTML and XML, as most of the developers know HTML."

If you have any experience with XML, you'll know that the structure of an AurigaDoc file is quite strict. You specify a document's title and author in a meta information section, then set up formatting options (margins, pointers to stylesheets, and the heights of headers and footers) in a document formatting section. The project provides a good example of an AurigaDoc source file.

Once the shell of the document is set up, you start writing. You can use any HTML tag to format the text of your document. However, when you are generating PDF, PostScript, and RTF documents only 26 HTML tags are supported. The HTML has to be valid. If, for example, you don't close a tag pair (like <p></p>), the AurigaDoc processors display an error message and the conversion will fail.

The look and feel of AurigaDoc's output is controlled by a Cascading Style Sheet. The application comes with a usable CSS file, or you can create one of your own. What's unique about AurigaDoc is that the look and feel of PostScript and PDF files is also controlled by a CSS file -- that's where the CSS2 parser comes in. Keep in mind, though, that for PostScript and PDF files only a handful of CSS properties are supported. You'll have to experiment to discover which properties work and which ones don't.

How do published AurigaDoc documents look? While working with the application, I not only created several documents from scratch, but also converted several DocBook XML files to the AurigaDoc format. These documents ranged in length from three to 40 pages. The output in all formats looked as good as, and often better than, the equivalent output from DocBook. In fact, to get similar results from DocBook I would have had to customize the XSL stylesheets.

Why use it?

So why create documentation using AurigaDoc instead of DocBook, LaTeX, groff, or even POD? According to the AurigaDoc users I interviewed, the main advantage of this application is simplicity.

"AurigaDoc is simple and the CSS is elegant," said Hans Deragon, developer of Autopoweroff. "I selected it because I wanted to set up a Web page quickly without the hassle of creating a CSS and having to go through a steep learning curve."

For Java developer Eric Porter-Johnston, "the DocBook suite of tools was really
cumbersome on Windows, while AurigaDoc ran fine from a Java-based build environment." While Porter-Johnston admits that the DocBook toolchain has improved, he still uses AurigaDoc for internal documentation. But, he adds, "on any new project, I'd probably compare AurigaDoc and the latest DocBook toolchain."

AurigaDoc's flexibility is another drawing point. In one project, Anil Gogia used AurigaDoc to generate RTF and PDF files using information in a database. For Gogia, "AurigaDoc's flexibility makes it better than other third-party open source tools."

This flexibility extends beyond the supported output formats. Khurshidali Shaikh told me that in addition to authoring software documentation, people have used AurigaDoc to publish FAQs, resumes, and reports. I know of at least two or three writers who have used it to generate white papers, writing samples, and even a book proposal or two.

Because the tools that make up AurigaDoc are written in Java, it's fairly easy to add AurigaDoc to the build process using an Ant target. That way, when an application is compiled it will pick up the latest version of the documentation as well. Using an Ant target and the Ant task that's bundled with AurigaDoc is explained in the user guide.

Some blemishes

While AurigaDoc is a solid documentation system, it's not without its drawbacks, especially for the hard-core technical writer. There is no way to create an index, which is an essential part of a longer manual. In DocBook, you can profile documents (that is, output different versions of a document with slightly varying content from a single source). AurigaDoc doesn't support profiling; if you want to generate a configuration guide for Linux and Windows, you have to maintain two separate files. Nor can you break a file into reusable chunks that can be used in other documents, at least not without a lot of copying and pasting.

AurigaDoc's error handling can be quite vexing at times. Sometimes, the error messages make sense; at other times, they're confusing. When AurigaDoc tried to process one of my test documents, it threw this error, followed by a long string of Java exceptions:

Invalid baseDir specified

It took me a while to figure out that in the XML file, I was pointing to a graphic in a directory that didn't exist.

Another drawback is that you have to set up your own table of contents in a separate meta information section of your document. If your document is long, this can be a time-consuming chore. As well, there is no really good way to specify subsections in a document. The workaround is to apply bold formatting to the subsections' titles.

Overall impressions

AurigaDoc is an easy-to-use, flexible, and fast way to create documentation. I wouldn't recommend it for large documentation projects, especially ones that involve writing manuals for multiple operating systems. But if your documentation needs are relatively simple and you want to create output in multiple formats, AurigaDoc is well worth a look. The results are visually appealing, and AurigaDoc's learning curve is practically non-existent.

Scott Nesbitt is a technical writer and journalist who spends way too much time searching for that elusive perfect documentation format.

Click Here!