October 10, 2007

Converting text files into ODF with odtwriter

Author: Dmitri Popov

While you can create and save documents in the OpenDocument format using OpenOffice.org, KWord, or AbiWord, there are other ways to generate ODF files. odtwriter, for example, can help you quickly convert plain text files formatted using reStructured Text markup into ODT (OpenOffice.org Writer-compatible ODF) documents. Using odtwriter, you can generate ODF files on machines that don't have ODF-compatible word processors installed, such as those running lightweight Linux distros, or simply compose documents in a text editor and leave the task of properly formatting them to odtwriter.

odtwriter is part of docutils, a set of tools for converting plain text files into other formats, such as HTML, XML, and LaTeX. This means that you can output the formatted text file into other formats besides ODF. This is a more efficient approach than creating an ODT document in Writer and then jumping through hoops to turn it into, for example, a clean HTML file.

odtwriter has a few features that are not available in OpenOffice.org. For example, OpenOffice.org doesn't support syntax highlighting, so making code blocks in a Writer document more legible and pretty is a non-trivial task. odtwriter, however, can apply syntax highlighting to the code blocks in the final ODT document.

The main drawback of using odtwriter to produce ODF documents is that you need to learn a new markup language. This is, however, not as bad as it might sound. Markup used in reStructured Text looks a lot like wiki markup, so you can pick up the basics in no time. The markup is also well-documented, with a reStructured Text Primer, Quick Reference, and cheat sheet available for your reading pleasure.

To make odtwriter work on your machine, you have to first install two packages: docutils and pygments. The latter is required only if you want the syntax highlighting functionality. On Ubuntu, installing both packages is as easy as running the sudo apt-get install python-docutils python-pygments command. Then, to install odtwriter, download its latest release, unpack it, and run the following commands:

python setup.py build
sudo python setup.py install

Using odtwriter is equally straightforward: the rst2odt.py command converts the specified source text file into an ODT document:

rst2odt.py text.txt document.odt

Like most command-line tools, odtwriter supports flags, and at least two of them can come in handy. By default, the syntax highlight feature in odtwriter is disabled; you need to use the --add-syntax-highlighting flag to turn it on:

rst2odt.py --add-syntax-highlighting text.txt document.odt

This applies Python syntax highlighting to the code blocks (marked as literal blocks) in the source text. For code blocks in other programming languages, you have to add so-called directives to the source text that activate alternative language (or lexer) highlighting:

.. sourcecode:: on
.. sourcecode:: Java

Since odtwriter relies on the Pygments utility for syntax highlighting, you can use any language supported by this software. You can find a list of all supported languages and their short names on the Pygments Web site.

Besides syntax highlighting, odtwriter can apply OpenOffice.org styles to the source text. When you run the rtst2odt.py command without specifying a style sheet, the resulting ODT document uses the default paragraph and character styles. However, odtwriter can use a special ODT file as a style sheet during conversion to apply styles to the marked text.

odtwriter allows you to specify the path to an ODT file containing styles that odtwriter uses during conversion. The style sheet file is a regular ODT document that contains a set of custom styles that all have the rststyle prefix: rststyle-textbody, rststyle-footer, rststyle-codeblock, and so on. odtwriter comes with a sample styles.odt style sheet file, which you can tweak to your liking by modifying the document in OpenOffice.org Writer. To point odtwriter to the style sheet, you can use either the --stylesheet or --stylesheet-path flags. The former allows you to specify the style sheet's location via its URL, while the latter can be used to specify the path to the style sheet file relative to the current working directory. For example, if the styles.odt file resides in your home directory, the command should be:

rst2odt.py --stylesheet-path=styles.odt text.txt document.odt

That's all there is to it. odtwriter by itself is simple in use, but its support for the reStructured Text markup and ability to use style sheets make it a powerful solution for generating ODF documents from plain text files.


  • Tools & Utilities
  • Office Software