Linux.com

Feature

Publishing Writer documents on the Web

By Dmitri Popov on March 09, 2007 (8:00:00 AM)

Share    Print    Comments   

Although OpenOffice.org has an HTML/XHTML export feature, it is not up to the snuff when it comes to turning Writer documents into clean HTML files. Instead, this feature turns even the simplest Writer documents into HTML gobbledygook, and while it attempts to preserve the original formatting, the results are often far from perfect. Moreover, publishing static HTML pages is so '90s: today, blogs and wikis rule the Web. So what options do you have if you want to convert your Writer documents into tidy HTML pages or wiki-formatted text files? Quite a few, actually.

Let's start with the simplest scenario, where you need to convert a single Writer document into an HTML page. One way to do this is to use a pair of scripts: odt2txt.py and markdown.py. The first script converts the Writer document into a plain text file and turns the text formatting into markdown markup (you can read more about markdown here and here). You can then convert the resulting text file into HTML using the markdown.py script. To perform this transformation, simply download both scripts, unpack them, and use the terminal to run them as follows:

  python odt2txt.py Loremipsum.odt > Loremipsum.txt
  python markdown.py Loremipsum.txt > Loremipsum.html

Using the odt2txt script for intermediary conversion has another advantage. Many blog, wiki, and content management systems support the markdown syntax either directly or via optional plugins. This means that you can easily publish the marked down file on your wiki or blog. For example, if you are using DokuWiki, you can make it recognize markdown by installing the markdown plugin. By default, some assembly is required to make the plugin work, but if you don't feel like fiddling with it, yours truly has done the dirty work for you and created a ready-to-use package. Simply download the markdown.zip file, unzip it, and place the resulting folder into the /lib/plugins directory in your DokuWiki installation. Use the <markdown></markdown> tags to mark the designated content in the wiki page.

Speaking of wikis, you can also convert the HTML file into a "native" wiki page using the excellent HTML::WikiConverter service. It supports all major wiki formatting dialects, and it's available as a standalone Perl script, which you can install and use on your own machine.

It's not all sunshine and unicorns, though, and the odt2txt script does have its limitations. The current version of the script supports the following formatting: italics, bold italics, ordered and unordered lists, block quotes, code blocks, hyperlinks, and footnotes. The two major elements that are not recognized by the script are tables and images.

If you want to publish the contents of a Writer document as a post on your blog, you can easily do so by using the functionality provided by Google Docs. Simply upload your Writer document to Google Docs, open it for editing, and press the Publish button. Besides Google's own Blogger service, you can publish the document in virtually any blog system, provided its API is supported by Google Docs (Google provides a complete list of supported APIs).

txt2tags menu
txt2tags - click to enlarge
You can also create clean HTML documents in Writer by using markup directly in the document and then saving it as a plain text file. But instead of using HTML markup, you can opt for something better -- namely txt2tags, a combination of a lightweight markup language and a conversion tool that can output the marked-up text file into a number of formats, including HTML, XHTML, MoinMoin wiki, and LaTeX. The main advantage of using the txt2tags markup instead of HTML is its simplicity and flexibility. txt2tags features an easy-to-learn syntax and allows you to do clever things such as creating a table of contents, specifying a filter, and generating an HTML file based on a custom CSS template (see this article for a more detailed description of txt2tags' features). Better yet, if you don't fancy learning txt2tags markup, I've created a simple extension for OpenOffice.org Writer that allows you to apply txt2tags formatting using menu commands. txt2tags is released under GPL and is available at SourceForge.net. To install the extension, download it, choose Tools -> Extension Manager, select My Extensions, press the Add button, and point it to the downloaded .oxt file. If you are using OpenOffice.org 2.0 or lower, then you have to change the .oxt file extension to .zip before you can install the package. By default, the Save as t2t command is disabled, but you can enable it by uncommenting the following line in the macro:

Shell("txt2tags",1, "--target html -H --no-encoding " & Right(DocDir, Len(DocDir)-7) & "/" & FileName & ".t2t")

Once enabled, this command automatically saves the currently opened document as plain text and then converts it to an HTML file with the .t2t file extension using txt2tags. You can, of course, set the command to save the file with an .html extension if you wish so.

These approaches are not as straightforward as clicking an Export button, but if you want to generate tidy HTML files out of your Writer documents or publish them on your blog or wiki, you should give these techniques a try.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, German, and Danish computer magazines.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, US, German, and Danish computer magazines.

Share    Print    Comments   

Comments

on Publishing Writer documents on the Web

Note: Comments are owned by the poster. We are not responsible for their content.

So Very Very Sad

Posted by: Anonymous Coward on March 10, 2007 03:05 AM
today, blogs and wikis rule the Web.

If this is correct, then it's time to declare the web "Dead". Rest In Peace World Wide Web. We shall miss you and your static pages with content that was of value.

#

Re:So Very Very Sad

Posted by: Administrator on March 10, 2007 11:19 PM
And why is that? Care to back up your claim? Or are you just trolling?

#

Re:Links to Markdown.py

Posted by: Anonymous Coward on March 10, 2007 04:17 AM
This URL works: <a href="http://sourceforge.net/projects/python-markdown/" title="sourceforge.net">http://sourceforge.net/projects/python-markdown/</a sourceforge.net>

#

txt2tag error

Posted by: Anonymous Coward on March 10, 2007 09:00 PM
In Sourceforge, there's an issue about an error in txt2tag script.

#

Re:txt2tag error

Posted by: Anonymous Coward on March 12, 2007 11:55 AM
I keep getting runtime errors when trying to use the extension.

#

Re:So Very Very Sad

Posted by: Anonymous Coward on March 11, 2007 07:07 AM
The writer was being sarcastic. If you do not learn to see sarcasm you will miss a lot of humor.


I agree with the point of the first poster of this thread. Static pages comprise most of the web and most of the value of the web is still in static pages. This is true without diminishing the contribution of blogs and wikis.

#

Re:Links to Markdown.py

Posted by: Anonymous Coward on March 11, 2007 11:19 AM
yeah
but i found the perl version of it..
<a href="http://daringfireball.net/projects/markdown/" title="daringfireball.net">http://daringfireball.net/projects/markdown/</a daringfireball.net>

-kevin lam

#

Smell the hypocrisy

Posted by: Anonymous Coward on March 12, 2007 06:49 PM
Instead, this feature turns even the simplest Writer documents into HTML gobbledygook

Huh. Sorta like MS Word did in it's first HTML-saving incarnation? Where's all the righteous indignation and vitriol about the quality of the HTML dumped out by Write? Oh, that's right, you don't bad-mouth open source crap, no matter how painful it is to work with, right? Just keep telling each other how wonderful it all is, and maybe someday the outside world will believe it, too.

#

Re:The difference is.....

Posted by: Anonymous Coward on March 12, 2007 09:11 PM
OpenOffice (including Writer) is under constant development to fix such issues -- and you don't have to wait 3 to 5 years and pay $350US for the updates.

There is no such thing as perfection, yet we strive for it anyway.

#

Re:The difference is.....

Posted by: Anonymous Coward on March 13, 2007 12:02 AM
Well, since that functionality was available in Word 10 years ago, I'd say you've got a long wait before you have to wait 3-5 years for OOO to support it properly. Personally, $350 is a small price to have perfection today. Of course, I'd rather just pay the <a href="http://www.amazon.com/Microsoft-Office-Home-Student-2007/dp/B000HCZ8EO/ref=pd_bbs_5/102-1773936-8896935?ie=UTF8&s=software&qid=1173718726&sr=8-5" title="amazon.com">$129 </a amazon.com>like everyone else.

#

Re:The difference is.....

Posted by: Administrator on March 14, 2007 07:37 AM
Everyone, huh? I can get MS Office Home Student 2007 for $129? I'm a 44 year old retired guy who hasn't been a student in over 25 years.


As for $350 being "a small price to have perfection," Open Office might not be perfect but MS Office is just as far away from perfection as anything can be. OO does everything MS Office can do just as well as MS Office and I pay nothing. Let's see... On one hand we have $350. On the other we have $0.00. Hmmm.... Let's think about this.


My question is this: If you are not an OO user why are you here posting comments on an article specific to OO? Me thinks thou might be trolling, yes?

#

Re:The difference is.....

Posted by: Administrator on March 14, 2007 09:25 AM
I just checked Amazon. $128.99 is for the Student version which Microsoft's license specifically permits only for students and teachers. Why would you suggest that "everyone else" do something Microsoft considers illegal?



As of 13 March 2007, the prices for a new (non-upgrade) copy of Office 2007 on Amazon are (in USD) $351.99 for Standard, $393.99 for Business, $424.99 for Professional, $579 for Ultimate. OpenOffice is a bargain.



There are cheaper prices for "used" copies, but I wouldn't recommend anyone risk that. Knowing Microsoft's past EULAs, as soon as Microsoft recognized the copy of Office was registered elsewhere they would deny upgrades and patches, since from their point of view it would be a pirate copy.

#

Re:The difference is.....

Posted by: Anonymous Coward on April 06, 2007 04:31 AM
Not exactly true. MS Word's advanced search and replace and token searching is superior to Writer's. Even though Writer uses regular expressions which should theoretically be stronger than Word's token searching, it can only use them in the search field, not the replace field (not to mention that regular expressions are a good deal more complicated than tokens.)

There are several "small touch" things in Writer that drive me crazy occasionally.

Now back to the real issue. Writer's HTML code is not gobbledygook. It is quite readable and quite logical. However, it does use deprecated items such as font tags. So in that sense it is not "clean".

#

Links to Markdown.py

Posted by: Administrator on March 10, 2007 03:29 AM
The link: <a href="http://www.freewisdom.org/projects/python-markdown/" title="freewisdom.org">http://www.freewisdom.org/projects/python-markdow<nobr>n<wbr></nobr> /</a freewisdom.org> is broken

#

An alternative

Posted by: Administrator on March 14, 2007 07:20 PM
Thanks for the article - though I could only get the odt2txt part to work on Ubuntu.

There was a very long discussion about word-to-HTML conversion on Slashdot last year. One thing I got from the discussion was that KWord has a very decent HTML Export Filter:

File -> Export -> HTML document (Document Type HTML 4.01, Mode: Light: Convert to strict (X)HTML - you might want to try the other type/mode options)

This works from<nobr> <wbr></nobr>.doc and<nobr> <wbr></nobr>.odt. The output is not perfect (extra empty lines and so forth), but is very clean and the best I've come across.

#

Publishing Writer documents on the Web

Posted by: Anonymous [ip: 82.18.152.190] on February 04, 2008 02:31 AM
Someone always brings in the Word/Office Vs Writer/OpenOffice argument. Either way, good article, excellent... I've been wondering how to do this.

Personally I like Writer. No it's not Word. It doesn't do everything that Word does... but it does 'enough'. Enough is unfortunately subjective and for some people it doesn't meet their needs; so be it. Hopefully it will sooner or later.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya