March 9, 2007

Publishing Writer documents on the Web

Author: Dmitri Popov

Although OpenOffice.org has an HTML/XHTML export feature, it is not up to the snuff when it comes to turning Writer documents into clean HTML files. Instead, this feature turns even the simplest Writer documents into HTML gobbledygook, and while it attempts to preserve the original formatting, the results are often far from perfect. Moreover, publishing static HTML pages is so '90s: today, blogs and wikis rule the Web. So what options do you have if you want to convert your Writer documents into tidy HTML pages or wiki-formatted text files? Quite a few, actually.

Let's start with the simplest scenario, where you need to convert a single Writer document into an HTML page. One way to do this is to use a pair of scripts: odt2txt.py and markdown.py. The first script converts the Writer document into a plain text file and turns the text formatting into markdown markup (you can read more about markdown here and here). You can then convert the resulting text file into HTML using the markdown.py script. To perform this transformation, simply download both scripts, unpack them, and use the terminal to run them as follows:

  python odt2txt.py Loremipsum.odt > Loremipsum.txt
  python markdown.py Loremipsum.txt > Loremipsum.html

Using the odt2txt script for intermediary conversion has another advantage. Many blog, wiki, and content management systems support the markdown syntax either directly or via optional plugins. This means that you can easily publish the marked down file on your wiki or blog. For example, if you are using DokuWiki, you can make it recognize markdown by installing the markdown plugin. By default, some assembly is required to make the plugin work, but if you don't feel like fiddling with it, yours truly has done the dirty work for you and created a ready-to-use package. Simply download the markdown.zip file, unzip it, and place the resulting folder into the /lib/plugins directory in your DokuWiki installation. Use the <markdown></markdown> tags to mark the designated content in the wiki page.

Speaking of wikis, you can also convert the HTML file into a "native" wiki page using the excellent HTML::WikiConverter service. It supports all major wiki formatting dialects, and it's available as a standalone Perl script, which you can install and use on your own machine.

It's not all sunshine and unicorns, though, and the odt2txt script does have its limitations. The current version of the script supports the following formatting: italics, bold italics, ordered and unordered lists, block quotes, code blocks, hyperlinks, and footnotes. The two major elements that are not recognized by the script are tables and images.

If you want to publish the contents of a Writer document as a post on your blog, you can easily do so by using the functionality provided by Google Docs. Simply upload your Writer document to Google Docs, open it for editing, and press the Publish button. Besides Google's own Blogger service, you can publish the document in virtually any blog system, provided its API is supported by Google Docs (Google provides a complete list of supported APIs).

txt2tags - click to enlarge

You can also create clean HTML documents in Writer by using markup directly in the document and then saving it as a plain text file. But instead of using HTML markup, you can opt for something better -- namely txt2tags, a combination of a lightweight markup language and a conversion tool that can output the marked-up text file into a number of formats, including HTML, XHTML, MoinMoin wiki, and LaTeX. The main advantage of using the txt2tags markup instead of HTML is its simplicity and flexibility. txt2tags features an easy-to-learn syntax and allows you to do clever things such as creating a table of contents, specifying a filter, and generating an HTML file based on a custom CSS template (see this article for a more detailed description of txt2tags' features). Better yet, if you don't fancy learning txt2tags markup, I've created a simple extension for OpenOffice.org Writer that allows you to apply txt2tags formatting using menu commands. txt2tags is released under GPL and is available at SourceForge.net. To install the extension, download it, choose Tools -> Extension Manager, select My Extensions, press the Add button, and point it to the downloaded .oxt file. If you are using OpenOffice.org 2.0 or lower, then you have to change the .oxt file extension to .zip before you can install the package. By default, the Save as t2t command is disabled, but you can enable it by uncommenting the following line in the macro:


Shell("txt2tags",1, "--target html -H --no-encoding " & Right(DocDir, Len(DocDir)-7) & "/" & FileName & ".t2t")

Once enabled, this command automatically saves the currently opened document as plain text and then converts it to an HTML file with the .t2t file extension using txt2tags. You can, of course, set the command to save the file with an .html extension if you wish so.

These approaches are not as straightforward as clicking an Export button, but if you want to generate tidy HTML files out of your Writer documents or publish them on your blog or wiki, you should give these techniques a try.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, German, and Danish computer magazines.

Click Here!