July 12, 2006

New tools for the toolbox

Author: Michael Stutz

Linux has the benefit of a steady barrage of new applications, utilities, software suites and tools -- as any casual perusal of freshmeat and SourceForge shows. One new bundle of software, the moreutils package by Debian developer Joey Hess, stands out from the rest. Moreutils is a collection of small, general-purpose tools that Hess says "nobody thought to write 30 years ago."

The tools philosophy

One of the endearing qualities of Unix is its "tools" philosophy. Like the hand tools and power tools in a workshop, each of these tools is designed to perform a specific task, and do it well. For example, cat concatenates its input and spills it to the standard output, tr is a filter that translates characters of its input, grep searches for a regular expression in its input and outputs the lines that match, and who gives a listing of all the users that are on the system.

Then came pipes, a way to pass the output of one tool to the input of another. By combining the individual tools, passing the output of one into the input of the next, you could build powerful "strings" of commands whose function is unpredicted in any of the single tools.

Every Linux user who tries out a famous pipeline for the first time gets a live demonstration of the power of this philosophy, as with the pipeline for showing the number of users logged in the system:

who | wc -l

Stringing pipelines together yield powerful results fast -- such as this example, where with just a few tools you can take the lines of output from some command and produce a list that's sorted by frequency, with the lines prefixed by their number of occurrences:

some-command-with-lots-of-output | sort | uniq -c | sort -n -r

The new tools

All of the software assembled in moreutils is GPLed, and was written by Hess and others, including longtime Linux hacker Lars Wirzenius. There are nine tools in the package right now, and counted together they comprise around 1,329 lines of source code -- scarcely 150 lines per utility. Let's take a look at them.

Check for valid Unicode

The isutf8 tool checks its input to see if the text is in the Unicode character set (UTF-8, which is the most popular Unicode encoding on UNIX systems):

isutf8 somefile

If there's an error and the input contains invalid UTF-8 code, isutf8 will say so. If there is no error (or if the input is in some other character set entirely), it outputs nothing.

Pick up standard input with sponge

The sponge tool just takes its standard input and writes it to a given file, but before it writes anything, it "soaks up" all of the standard input first -- so you can write to the same file without clobbering it.

This comes in handy in those cases when you want to extract certain lines from a file and then write the lines back to the file itself. The command below is an example of what not to do -- it extracts the lines you want from myfile, yes, but then the redirection clobbers the file, rendering it empty:

grep foo myfile > myfile

To do what you intend -- take all the lines of myfile containing "foo" and write those lines back to myfile, so that the file only contains those lines -- use sponge in this way:

grep foo myfile | sponge myfile

Add timestamps with ts

The ts filter is a Perl script that prefaces its standard input with a timestamp and a space character. This is good for logging program output:

some-program | ts > logfile

The default timestamp format is "%H:%M:%S"; you can, however, specify another format as an argument (see the man page for strftime to see what's possible), or use it to prefix lines with any arbitrary text:

some-program | ts `hostname` > logfile

Edit directories with vidir

The vidir tool lets you edit directory and file names in your favorite text editor. It uses whatever editor you have set in the EDITOR or VISUAL environment variables; otherwise, as you'd expect, it uses vi.

With no arguments, vidir opens the contents of the current directory for editing -- you can delete files, rename them, or edit them. You can give, as an argument, a directory name, or a list of files to edit, or a hyphen character, which then reads the names of files from standard input and then it opens them for editing.

Edit pipelines with vipe

The vipe tool lets you insert an interactive text editor into a pipeline. Which editor it uses is determined the same way as with vidir.

vipe can be helpful when you want to edit some command output, but also want to pass it on to some other commands -- just be sure to write your changes before exiting the editor, or the changes won't get passed on to the pipeline.

For example, if you want to mail the output of commands to your friend but want to preface the output with a comment, you can do it like so:

some-commands | vipe | mail pal@example.net

Merge files with combine

The combine tool is a Perl script that combines the lines from two files (or standard input), using Boolean operators, according to this table:

and	outputs lines contained in both files
or	outputs lines contained in either file
not	outputs lines contained in the first file but not the second
xor	outputs lines that are in either file but not in both

Give as arguments the first file, the operator, and the second file.

For example, here's how you'd use it to output all of the lines that are in /tmp/passwd but not in /etc/passwd:

combine /tmp/passwd not /etc/passwd

Gather network interface information with ifdata

The ifdata tool gives parse-friendly output of all kinds of network interface data for a given interface, according to the following list of options:

  -e   Reports interface existence via return code
  -p   Print out the whole config of iface
 -pe   Print out yes or no according to existence
 -ph   Print out the hardware address
 -pa   Print out the address
 -pn   Print netmask
 -pN   Print network address
 -pb   Print broadcast
 -pm   Print mtu
 -pf   Print flags
 -si   Print all statistics on input
-sip   Print # of in packets
-sib   Print # of in bytes
-sie   Print # of in errors
-sid   Print # of in drops
-sif   Print # of in fifo overruns
-sic   Print # of in compress
-sim   Print # of in multicast
 -so   Print all statistics on output
-sop   Print # of out packets
-sob   Print # of out bytes
-soe   Print # of out errors
-sod   Print # of out drops
-sof   Print # of out fifo overruns
-sox   Print # of out collisions
-soc   Print # of out carrier loss
-som   Print # of out multicast

For instance, if you have a PPP connection, this command returns the network IP address:

ifdata -pN ppp0

The -pf option outputs a list of network interface flags for a given interface in a much easier to parse format than ifconfig's, showing whether each flag is on or off. The -pf option tells ifdata to print the flags for the interface, so ifdata -pf lo displays the flags for the loopback interface:

ifdata -pf lo
On  Up
Off Broadcast
Off Debugging
On  Loopback
Off Ppp
Off No-trailers
On  Running
Off No-arp
Off Promiscuous
Off All-multicast
Off Load-master
Off Load-slave
Off Multicast
Off Port-select
Off Auto-detect
Off Dynaddr
Off Unknown-flags

Tee to pipes with pee

The unfortunately named pee is a tee for pipes: where tee passes its standard input to both standard output and any given filenames, pee sends its standard input to any given commands:

who | pee "wc -l > lines" "wc -w > words" "wc -c > chars"

Unlike tee, pee doesn't send to standard output -- but you can effectively add that functionality by using cat.

Run commands on compressed files with zrun

Finally, zrun takes a command line as an argument and uncompresses any files in the command line -- a useful tool for when you want to run a command on a compressed file (.gz or .bz2) without uncompressing it first. For example, to view a compressed image file with feh without having to uncompress it, you can try:

zrun feh image.bz2

Future plans

There's work to be done -- some of the tools in the moreutils package don't support the standard options, and the documentation is still a little spotty -- but it's a start; Hess has pointed out that small tools tend to be forgotten, and banding them together in a collection with a purpose is a good way to gain notice, and help improve upon them.

He's also soliciting feedback on which tools might be added to the collection. Among those being considered is tmp, which would put standard input in a temporary file and pass that file to the given command -- which you'd use to send standard input to a tool that can only take a file as an argument.

Meanwhile, you might want to add these tools to your toolbox -- they don't take up much room, and you just never know when they might come in handy.

Click Here!