June 29, 2006

King of the Linux reference desk

Author: Michael Stutz

There are plenty of reference applications available for Linux, but the ultimate Linux lexicon has to be WordNet, a powerful desktop dictionary.

WordNet is literally million-dollar software -- over the 20 years it was under development at Princeton University, the project received more than $3 million in grants. Even so, WordNet lacks a few essential features that every good print dictionary has. For instance, there are no etymologic or orthographic aids, and prepositions, pronouns and conjunctions are omitted entirely. But in other ways WordNet is more powerful than any printed counterpart. Words are ordered hyponymically; that is, they're grouped and sorted in a hierarchy based on their meanings. The WordNet lexical database contains about 150,000 words and their semantic relations, which gives it unique advantages.

WordNet tools

You can try WordNet out on the Web, but you'll probably want to install it so that you have it right on your system.

The standard WordNet distribution comes with two tools for interfacing with the WordNet database. First is a graphical word browser, wnb, which you run in X. It's nice, but most people don't need to click through the dictionary all day, and it's hardly worth the bother to start it up when you just want to look up a single definition. That's where the command-line tool, wn, comes in.

To get an overview of what information is in the WordNet database for a given word, run wn with the word as an argument. As output you'll get a list, separated by part of speech, containing everything available for that word and the options to use with the command to get that information. Let's look at an example:

$ wn home

Information available for noun home
        -hypen          Hypernyms
        -hypon, -treen  Hyponyms & Hyponym Tree
        -synsn          Synonyms (ordered by frequency)
        -partn          Has Part Meronyms
        -meron          All Meronyms
        -famln          Familiarity & Polysemy Count
        -coorn          Coordinate Terms (sisters)
        -hholn          Hierarchical Holonyms
        -grepn          List of Compound Words
        -over           Overview of Senses

Information available for verb home
        -hypev          Hypernyms
        -synsv          Synonyms (ordered by frequency)
        -famlv          Familiarity & Polysemy Count
        -framv          Verb Frames
        -coorv          Coordinate Terms (sisters)
        -simsv          Synonyms (grouped by similarity of meaning)
        -grepv          List of Compound Words
        -over           Overview of Senses

Information available for adj home
        -antsa          Antonyms
        -synsa          Synonyms (ordered by frequency)
        -perta          Pertainyms
        -famla          Familiarity & Polysemy Count
        -grepa          List of Compound Words
        -over           Overview of Senses

Information available for adv home
        -synsr          Synonyms (ordered by frequency)
        -famlr          Familiarity & Polysemy Count
        -grepr          List of Compound Words
        -over           Overview of Senses

That's a lot of information! We'll go over the major options below.

Definitions and thesaurus

To output an overview of the definitions for a word, give the word as an argument to wn and follow the argument with the -over option. Yes, wn is one of those odd tools whose options must come after the argument.

To output all of the definitions of the word "home," type:

wn home -over

You can also use WordNet as a thesaurus. Do that by giving the word as an argument, followed by one of the following options for outputting synonyms, depending on the part of speech:

-synsn    nouns
-synsv    verbs
-synsa    adjectives
-sysnr    adverbs

For example, to get synonyms for the word "home" when it's used as an adverb, type wn home -synsr, but to get the synonyms for "home" as a noun, type wn home -synsn.

A good thesaurus will also provide you with antonyms, which are words that have the opposite meaning of a given word. To output antonyms of a word, give the word as an argument followed by one the following options:

-antsn    nouns
-antsv    verbs
-antsa    adjectives
-antsr    adverbs

So for instance the command wn home -antsa outputs antonyms for the adjective "home."

Word hierarchies

WordNet's database arrangement really shows its power when you want to explore word hierarchies.

For a given word, WordNet can tell you more than just the synonym and antonym relationships that any thesaurus can -- it can give hypernyms, hyponyms, holonyms, and meronyms, as described below.

A hypernym of a word is a related term whose meaning is more general than the given word. For example, the words "address" and "location" are hypernyms of the word "home."

Conversely, a hyponym of a word is a term whose meaning is more specific than the given word. Words such as "condominium" and "lodge" are hyponyms of the word "home."

A holonym of a word is a term whose meaning contains the whole of which a part is described by the word. For example, "home" is a holonym for "parlor."

Conversely, a meronym of a word is a related term whose meaning makes up a part of what is described by the word -- so words such as "bath," "kitchen," and "parlor" are all meronyms for "home."

Here are the valid options and their meanings:

-hypen    noun hypernyms
-hypev    verb hypernyms
-hypon    noun hyponyms
-hypov    verb hyponyms
-holon    holonyms
-meron    meronyms

Searching words

WordNet's "-grep" options let you search through the words in the database based on their parts of speech. It's not really a full "grep," because you can search only for strings and not the full range of regular expressions, but it's a still handy feature.

These options are built just like the others:

-grepn    search nouns
-grepv    search verbs
-grepa    search adjectives
-grepr    search adverbs

To list all adverbs containing the string "home," type:

wn home -grepr

You can combine options to search multiple word senses. (They list separately in the output.) To list all verbs and adverbs containing "ing," type:

wn ing -grepv -grepr

Command-line speed

Instead of giving the awkward trailing options with the wn tool, you can increase your productivity by coming up with a new front end for the options that you frequently use. You could always write a fancy script to do it, but something this simple might be better served using shell functions.

If you use the Bash shell, try entering the following functions at the shell prompt:

def () { wn $1 -over; }
synn () { wn $1 -synsn; }
synv () { wn $1 -synsv; }
syna () { wn $1 -synsa; }
synr () { wn $1 -synsr; }

These are functions to run wn to obtain word definitions as well as synonyms for any part of speech. For example, to get a definition for the word "home," type:

def home

And to get a list of synonyms for the verb "home":

synsv home

If you put these functions in your .bashrc file, they'll work for you in all of your shells.

Learning more

There's a lot more to WordNet. To learn more, The WordNet Reference Manual is probably the best place to start. It's broken up into many man pages; to get a list of them, first consult the man page for wnintro as it appears in sections 1, 5, and 7 of the manual pages:

man 1 wnintro
man 5 wnintro
man 7 wnintro

These pages list the names of other WordNet man pages which you can read, from a glossary of WordNet terms (man wngloss) to the format of the WordNet database files (man wndb).

Click Here!