October 7, 2011

Weekend Project: Get Grammar Checking for Your Open Source Office Suite

Sepllchecking. I mean, spellchecking: it's such an integrated part of our word processing and email workflow these days that we feel ripped off when an application (or phone...) <em>doesn't</em> have it built-in. Sadly, grammar-checking is a little bit behind. Checking words against a dictionary is trivial, but picking out parts of speech and sentence structure is trickier. Some proprietary office suites include grammar tools built-in, although the free software suites do not. But there are plug-ins available that bring grammar and stylistic help to all of the major open source word processors.

Naturally, grammar-checking is tightly bound to the language of the document. Thus each grammatical tool must add explicit support for each language it supports. A few projects incorporate more than one language, but for non-English writing, it is advisable to look for a language-specific plug-in for your word processor. It is likely to be of higher quality and have a wider rule-set if it is maintained by native speakers, as opposed to a research project or one-size-fits-all generic grammar framework.

Another, somewhat related caveat with grammar-checking is that most of the time, grammatical rules cover only generic conversational language. If you are writing a technical document (particularly one about programming with its host of reserved words), you are liable to get lots of "false negatives," so to speak: points where the grammar checker thinks you have made an mistake, but in reality you were using a technical term that looks wrong if you don't know better.

A separate bit of fall-out from this language dependency is that I, as a moderately-fluent English speaker, do not find it easy to assess the quality of grammatical tools designed for other languages. I have found a few, but if you have others or have a strong opinion, please feel free to share them in the comments. Please also share any tools or scripts you find for Calligra, the KDE office suite — LibreOffice, OpenOffice, and AbiWord have plenty of options. But I did not have any luck locating options for Calligra.

Grammar Hammers

LibreOffice, OpenOffice, and AbiWord all have extension mechanisms, so even though there is not a built-in grammar checker, you can find plug-ins to add in the functionality. As of right now, the Big Two in this arena are LanguageTool and After The Deadline. They offer grammar engines suitable for use in the major office suites but also in other environments.

LanguageTool can run as a standalone Java application, or be installed as an extension for LibreOffice, OpenOffice.org, or (with a little sleight-of-hand) other applications. Each language pack is maintained separately as a collection of rules and exceptions. The current version has robust support for English, German, French, Polish, Dutch, Romanian, and Russian, and supplementary support for about 19 other languages. You can even browse the language rules data online; if you want to contribute to growing support for your language, that is a good place to start. There is also a new rule-conversion tool added as part of 2011's Summer Of Code to help grow the language support.

After The Deadline uses a client-server model. The makers of ATD run the primary server, to which the various extensions and plug-ins connect. But you can also download the source code and run your own server if you so desire. Extensions are provided for LibreOffice, OpenOffice.org, and various other applications (such as Firefox and Chrome). App developers can even incorporate it directly into other applications through the API. ATD only supports English at the moment, but there is work underway to extend it. It sports some features that LanguageTool does not, including differentiating between grammar and "style" problems, and an integrated spell-checker.

Between the two, LanguageTool offers broader language support, but ATD offers hooks into more applications (you can even integrate it into your WordPress blog). A long time ago, ATD started out as a fork of LanguageTool, and although they have different areas of emphasis now, there is no reason you cannot install both and adapt to whichever one gives you the best results.

Elsewhere in Morphology-ology

AbiWord has its own grammar engine project, although depending on your platform and distribution, it may not be installed by default. If not, you can download it — or grab recent updates — from the AbiWord site. It is based on open source work done at Carnegie Mellon University and at Open Cognition. The focus of the plug-in is on English, but there is support for German, French, and Lithuanian as well, plus special work to enable support for scientific and medical text.

Beyond these three, most of the extensions available are single-language tools. LibreOffice has separate extensions available for Russian, Portuguese (Brazilian), French, and Irish. OpenOffice users can find extensions for Portuguese, French, Esperanto, Russian, and Hungarian.

Grammar checking is a subject of ongoing academic research, and there is no one ideal method. So if you don't find the perfect grammar tool for your word processor of choice and language, it is a good idea to keep checking back periodically. You'll find other tools out there in various states of readiness; the LanguageTool site has a good list, as does the LinguComponent section on the OpenOffice site.

Who knows? You may even get motivated to reinvigorate an existing project, like Graviax, or to help port and debug one of the older OpenOffice extensions over to LibreOffice. Or maybe you can just contribute back by strengthening the support in one of the existing projects for the languages you know.

Click Here!