Author: Michael Stutz
styletools put a GNU face on an old Unix feature. These tools read text input, either from a file or the standard input.
dictionchecks the input at the sentence level, and marks wordy and trite phrases, cliches, and the like, while
styleworks on the overall document, giving a summary of the writing style with a number of readability tests.
Years ago these tools came with AT&T Unix, packaged in a utility set that included similar tools and was called the Writers’ Workbench (WWB). They fell by the wayside and were generally forgotten, but in recent years the tools were rewritten for Linux by Michael Haardt, and eventually became part of the GNU Project.
The GNU versions of these tools are not clones of the old AT&T originals, but they are very similar — and with new innovations, they keep getting better. The GNU versions work in the English and German languages, and some of the new features in the 1.10 series include support for British English and recognition of nested sentences inside quoting.
Checking your choice of words
diction tool will analyze its input and display any doubled words, cliches, and potentially incorrect wording or phrases enclosed in brackets like [this]. Not everything that
diction marks will actually be wrong, per se — for example, since writers sometimes confuse the words “desert” and “dessert,” these words are always marked — but you can use its output as a guide to double-check your writing for common errors.
diction expects its input to be in American English; to specify a language, specify it as an argument to the
-L option, according to this table:
en American English en_GB British English de German
Among the tool’s newer features is the
--beginner) option, which looks for errors commonly made by inexperienced writers, such as confusing “will” and “shall”:
$ echo "How will the company best utilize it's resources?"| ./diction -b (stdin):1: How [will] the company best [utilize] [it's] resources? 3 phrases in 1 sentence found. $
diction only encloses suspect words and phrases in brackets, and it’s up to you to figure out what might (or might not) be wrong with the marked text. To output the marked text with
diction‘s suggestions for improvement, use the
$ echo "How will the company best utilize it's resources?"| ./diction -s (stdin):1: How will the company best [utilize -> use] [it's -> = "it is" or "its"?] resources? 2 phrases in 1 sentence found. $
To only search for a particular sort of error, use the
-s option and then use
grep to filter out the lines that match the suggestion text you’re looking for. To get a list of sentences containing doubled words, for instance, use the
-s option and filter out the lines containing “Doubled word”:
$ diction -s termpaper.txt | grep "Doubled word"
The doubled word search is better than a plain
grep solution because
diction works on sentences, not lines — it catches doubled words even if there’s a newline character in between them. What it won’t match are doubles whose case is different, or a double where the first word is the end of one sentence and the next word is the beginning of the following sentence.
If you give the
diction will do every check except for the doubled-word check.
With a little customization,
diction is an excellent tool for checking documents against local style guides. You can create your own style guide by using as a model the default phrase database file, which is normally stored in either the /usr/share/diction/ or /usr/local/share/diction directories. It’s simply a table where each line contains a target word or phrase followed by a tab character and the suggestion, warning, comment, or suggested replacement text to display for that target. Begin a suggestion with an equals sign followed by a word or phrase to use the suggestion of the latter. Here are a few example lines from the American English database:
a majority of most accomplished did desert "Desert" and "dessert" are sometimes confused, to the delight of the masses. dessert = desert easier said than done (cliche, avoid) it is apparent that apparently
Use your custom style file by calling its name with the
diction will use your file in conjunction with the default file, unless you turn the latter off with the
diction -n -f /usr/local/share/diction/house.style submission.txt
The Kincaid Formula is particularly good for technical writing — it was originally developed for use on Navy training manuals, and like many readability indicators it outputs a US grade level.
The Automated Readability Index (ARI) uses character and sentence counts to determine an estimated grade level; it was developed by the US Air Force.
The Coleman-Liau Formula also outputs US grade level, and bases its readability by a character count.
The Flesch Reading Ease score is a readability index, used by the US government, where lower scores indicate higher difficulty.
Robert Gunning’s Fog Index is roughly based on sentence length and the number of syllables per word, and its output is the approximate US grade level required to immediately comprehend the text.
The Lix formulatests for long words (with more than 7 characters) and outputs a number from very easy (0-24) to difficult (over 54).
Another grade-output test is the easy to computeSMOG-Grading, which is a test based on word “complexity.”
At this time
diction only comes with stock phrase files for its three supported languages, but it would be an interesting free software project to build up style files for checking text against the most popular and the better style guides — the Chicago and AP style manuals and Fowler’s Modern English Usage would be great places to start.
Checking your overall document style
style tool analyzes all of the sentences in a given document and outputs some facts about its overall readability: the document’s score for a number of readability tests (many developed by the US military), plus sentence counts and word usage information.
The sentence count is like a super
wc, showing the number of characters, sentences, and paragraphs, the average word and sentence lengths, the number of short sentences (9 words or less), long sentences (24 words or more), questions, and passive sentences, and which two sentences were the longest and shortest.
The word usage summary tells the number of verbs, with a breakdown by type, and a breakdown on types of sentence beginnings: pronoun, interrogative pronoun, article, subordinating conjunction, conjunction, and preposition.
Regular usage is straightforward. Pipe some text to style or give a file name as an argument. Like
diction, you can change the language with the
-L option (it currently supports the same three languages), and there are a few other options you can use to get extra output that will display before the report summary (see sidebar).
For example, here’s the command to output sentences with an ARI of 25 or higher and get a style summary on a document written in British English:
$ style -r 25 -L en_GB london.report
Check more than just text
style won’t work on rough notes or other unpolished material that isn’t properly capitalized. But while they only take plain text input, they come in handy for other kinds of documents, too — just convert the document in question to text, and send the output over to the tool. For instance, you can check the style of a Web page by dumping the text output of
lynx -dump -nolist http://localhost/mypage.html | style