Author: Koen Vervloesem
The OpenTaal project (Dutch for “OpenLanguage”) has published the first open source word list to be certified by the Dutch Language Union as corresponding to official spelling. Simon Brouwer, project leader of OpenTaal, says, “This is a milestone. Users of open source software can trust their Dutch spell checker now. They have the guarantee that their word list is consistent with the official spelling.”
Other products that have received certification include Microsoft’s spell checker and the dictionaries of publishers Sdu, Spectrum, Van Dale, and Wolters Plantyn.
The first open Dutch word list was created in 1996 by a workgroup of the Dutch TeX user group NTG. In 2001, Brouwer discovered that he could easily adapt the word list for use in OpenOffice.org. “Indirectly, this was the motivation to start with nl.openoffice.org,” Brouwer says.
A new spelling, a new word list
In 2005, the Dutch language area got new spelling, which consists mainly of corrections to the spelling of 1995. Starting in August 2006, the new spelling would be mandatory for the government and schools. This revived the project of creating an open source word list. At the end of 2005 the Dutch government program Open Standards and Open Source Software (OSOSS) initiated the OpenTaal project to coordinate the various Dutch open source projects that had an interest in the new spelling, with the aim of developing a Dutch word list conforming to the new spelling. This would give users of open source software like OpenOffice.org, TeX, Thunderbird, and Firefox an up-to-date spell checker. OSOSS contacted the Dutch Language Union, which agreed to assist the project.
Brouwer says, “The originators come from OSOSS, the Dutch TeX user group NTG, and OpenOffice.org. Our Web site is running on an NTG server and we are actively using the collaborative software development platform Uitwisselplatform, which is based on the open source project GForge. We also get support from the foundation NLNet, which is stimulating network research and development in the domain of Internet technology.”
According to Brouwer, more than 10 Dutch and Belgian volunteers, including linguists, harvested words from various sources to create the list. “These are people who like to be engaged in language and are supporters of the open source software model.” In January, the project published a beta version of its language files, which got a lot of attention in the Dutch language area. The current word list includes more than 140,000 entries.
A spell checker everywhere
The OpenTaal Web site offers language files for OpenOffice.org, Firefox, and Thunderbird. Users can also download the source files: the plain word lists, which can be converted to other file formats for things like integration with Aspell and Ispell. “This is not a difficult task for someone who knows the relevant software,” Brouwer says. The OpenTaal project uses the LGPL license for its language files, which allows the use of the word list with open and closed source software.
The OpenTaal site lets visitors help with the development of the language files. For example, visitors can help check the validity of a harvested word, decide on the word type, or suggest synonyms.
In addition to extending and improving the word list, the OpenTaal project is now working on a grammar checker for Dutch, a synonym list, and hyphenation. The project is actively seeking volunteers with an interest in their aims, because these spin-off projects are still in their infancy.
- Open Source