October 9, 2006

CLI Magic: Running multiple jobs with xjobs

Author: Joe 'Zonker' Brockmeier

Ever feel like you're not getting the most out of your multiprocessor machine? The xjobs utility allows you to schedule several processes to run simultaneously to make the most of your system's resources.

Xjobs takes a list of arguments from standard input and passes them to a utility, or takes a list of commands from a script, and then runs the jobs in parallel. If you have a multiprocessor machine, xjobs will automatically run one job per processor by default. For instance, on a dual-CPU machine, if you run ls -1 *gz | xjobs gunzip, xjobs will gunzip two files at a time by default. If you run the same command on a quad-CPU machine, it will gunzip four files at a time by default, until it runs out of files to process.

Getting xjobs

Xjobs isn't in any of the major distros, so you'll need to compile it from source. This shouldn't be difficult; just make sure that you have GNU Flex installed -- which should be available in any major Linux distro.

Grab the most recent release of xjobs from the utility's homepage. Uncompress the tarball using tar -zxvf xjobs-xxxxxxxx.tgz, replacing xxxxxxxx with the current version of xjobs, and then use cd to move to the new xjobs-xxxxxxxx directory.

Next, run make install to compile and install xjobs. As long as you have GNU Make, Flex, and the C compiler installed, everything should run fine. So far, I've compiled xjobs on CentOS 4.3, Ubuntu Dapper on AMD64, and Nexenta with no problems.

Using xjobs

The syntax for xjobs is simple. Let's say you have a directory with a bunch of WAV files that you'd like to encode as Ogg Vorbis files. You'd use the following:

find -name '*wav' | xjobs oggenc

This takes all of the files in the current directory that have the wav extension and sends the names to xjobs, which then executes oggenc filename.wav.

Without any arguments, xjobs will run one job at a time for each processor on the system. You might be wondering, what's the difference whether I run one job at a time, or two at a time, or more? Actually, there can be a difference in processing time, sometimes significant.

On a single-CPU Athlon 64 3000+ with 2GB of RAM, I took a directory that contained about 250 small WAV files and ran find -name "*wav" -exec oggenc {} \;, which encoded each WAV file into Ogg one after the other. This took about 16 seconds each time I ran the job. Next, I tried find -name '*wav' | xjobs oggenc, which took about 16 seconds each time as well. Then I used the -j 2 option to tell xjobs to run two processes at one time. When running more than one job at a time, the system actually took slightly longer (about one second on average) to encode the same set of files.

However, when I ran the same jobs on a dual-CPU Pentium III 1.0GHz machine with 2GB of RAM, the benefits of xjobs kicked in. If I ran find -name "*wav" -exec oggenc {} \;, the job took from 36 to 38 seconds each time I ran the job. When I ran it using find -name '*wav' | xjobs oggenc, it took from 17 to 19 seconds each time.

In the grand scheme of things, saving an average of 19 seconds to encode a few WAV files may not mean much, but if you're running more resource-intense jobs, it just might make a difference.

Note that you can do many of the same things with the xargs command that you can do with xjobs, but xargs has to be told specifically how many jobs to run in parallel (it won't autodetect the number of CPUs).

One of the downsides to xjobs is that it can interpret arguments to commands as additional jobs to be run. For example, you might expect that this would work: find -name '*wav' | xjobs oggenc -b 128 . However, when filtered through xjobs, oggenc gets the options as files to encode and encodes the WAV files with no options, and then gives errors about being unable to open input file "b" and "128," which isn't really desirable.

To get around this, you can create a small shell script with the options you'll use often. For instance, I created a small script called oenc:

oggenc -b 128 --downmix $1

Make sure to make the script executable, and put it somewhere in your path, then run find -name '*wav' | xjobs oenc and everything works just fine.

One of the things I like about xjobs is that I can feed it a list of commands, then go away. However, if you actually want xjobs to work interactively, you can use the -p option to require xjobs to prompt you for each job. If you use the -p option, xjobs will ask before each job if you want to start the job or not. If you answer "y" it will proceed, if you answer "n" it will move on to the next queued job and prompt again.

Running jobs from a script

Xjobs will also run a set of commands from a script, so you don't have to worry about entering a long list of commands at the shell prompt. Let's say that you have a bunch of jobs to run in a batch. Enter them into a text file with one command per line, like so:

oggenc -b 128 --downmix bigaudio.wav
lame -b 128 -m mo bigaudio.wav
oggenc -b 128 --downmix bigaudio2.wav
lame -b 128 -m mo bigaudio2.wav
gzip bigaudio.wav
gzip bigaudio2.wav

Once you've saved the script, you can run xjobs -s scriptname and xjobs will walk through the commands and execute them in order. If you're using xjobs on a dual-CPU machine, make sure that you don't have a sequence of commands that require one job to be finished before the next job starts.

Note that, using the script, you can specify command-line arguments.

Xjobs is a handy tool if you happen to have a dual-CPU machine and a stack of jobs to run. Be sure to read the xjobs man page for additional examples and options.

Click Here!