May 10, 2007

Extending OpenOffice.org: Checking grammar with LanguageTool

Author: Dmitri Popov

One of the features that many users dearly miss in OpenOffice.org is a grammar checker. Fortunately, LanguageTool fills the void, adding grammar-checking capabilities to OpenOffice.org.

Although LanguageTool is probably not on par with the grammar checker offered by Microsoft Office or other commercial closed source office suites, it does have one important advantage: you can easily define new grammar rules. The current version of LanguageTools supports several languages besides English, including German, Polish, French, and Dutch. The degree of support varies from language to language; right now the most supported language is Polish (it includes 579 grammar rules), followed by English (221 rules) and Dutch (198 rules). You can see a full list of the supported languages at LanguageTool's Web site.

You can install LanguageTool like any other OpenOffice.org extension. In OpenOffice.org, choose Tools -> Extension Manager, select the My Extensions section, press the Add button, and select the LanguageTool.zip file. Restart OpenOffice.org, and you should see the LanguageTool menu in the Main toolbar.

To check grammar in the currently opened document, choose LanguageTool -> Check Text. The Configuration command allows you to set your native language as well as enable and disable specific grammar rules.

You can use LanguageTool not only as an OpenOffice.org extension, but also as a standalone GUI tool, a command-line tool, and even a service. To launch LanguageTool as a standalone GUI tool, unpack the LanguageTool-x.x.x.zip package and the standalone-libs.zip archive inside of it. Start LanguageTool using by launching the LanguageToolGUI.jar file:

java -jar LanguageToolGUI.jar

LanguageTool's real forte, though, lies in its extensibility. It stores all grammar rules in the rules/xx/grammar.xml file (where xx refers to the actual language code, e.g. en, de, fr), and you can define and add new rules to it. For example, let's say you are having problems with the word "monkey jacket": for some inexplicable reason, you tend to write "wonky jacket" instead. Though the spell checker can't identify these kinds of mistakes (there is nothing wrong with the words "wonky" and "jacket"), it's a job for LanguageTool. To make LanguageTool catch this mistake, you can create a simple grammar rule:

  <rule id="WONKY_JACKET" name="Possible typo 'wonky jacket' (monkey jacket)">
       <pattern>
         <token>wonky</token>
         <token>jacket</token>
       </pattern>
       <message>Did you mean <suggestion>monkey jacket</suggestion>?</message>
       <example type="correct">All the officer were wearing monkey jackets.</example>
       <example type="incorrect">All the officers were wearing<marker>wonky</marker> jackets.</example>
  </rule>

The <pattern> specifies the sequence of words marked with the <token> tag. In this case, the pattern is defined as the word "wonky" followed by the word "jacket." The <message> tag marks the error message, while the <example> tag marks the correct and incorrect examples.

You can find more information on defining grammar rules on LanguageTool's Web site, and you can learn a few tricks by taking a closer look at the predefined rules in the grammar.xml file.

Dmitri Popov is a freelance writer whose articles have appeared in Russian, British, US, German, and Danish computer magazines.

Click Here!