March 22, 2006

Command line Perl for sysadmins

Author: Hubert Chen

Using perl -e allows you to specify a script right on the command line. It's a powerful, underused feature even for people who use Perl regularly. Perl's powerful command line options make it a more flexible replacement for sed, awk, and even vi. Combine perl -e with the command line editing capability of modern shells and you can, write, test, and debug in record time.

The -p option tells Perl to act as a stream editor similar to sed and awk. Perl with the -e and -p options can make a nice addition to your Unix toolbox. Perl has a more familiar syntax than sed and awk for C programmers, and much more powerful programming constructs.

For editing tasks on large files it's frequently faster to use perl -pe rather than invoking an editor that has to load a whole file into memory at startup. Editing multi-gigabyte log files with a traditional editor is out of the question, but a stream editor like perl -p will surprise you with its speed and ease.

The -p option takes each line of the standard input (STDIN) and assigns that line to the variable $_, then executes the script on that line. The $_ variable is a special variable which is set each iteration of the loop. When your script is complete it prints out the value of $_ to standard output (STDOUT). If there are command line arguments, -p will use those arguments as filenames and perform the same steps on each line of those files. The -n option is similar to -p except it does not print every line to the standard output.

So the command perl -pe '' file1 file2 specifies an empty Perl script, but perl -p will still read in every line from those two files and print it to STDOUT. That line of code will perform the same function as the command cat file1 file2.

To make changes to big files I frequently use commands like this:

perl -pe 's/here is a typi/here is a typo/g' < inputfile > outputfile

In this command, 's/here is a typi/here is a typo/g' tells Perl to search for "here is a typi" and replace it with "here is a typo." Make sure to give the string match enough context to avoid accidentally substituting more times than you want. For example, if we limited the search string to "typi" we might end up generating "typong" from "typing" or "typocal" from "typical."

The g modifier tells Perl to replace the string globally. Without it, substitution occurs only once per line.

Notice that the whole script is enclosed in single quotes. The shell will expand *, $, and other shell variables that aren't in single quotes, so usually you will want to enclose a script in single quotes. Perl scripts use single and double quotes in the same way as the shell: substituting in variables when strings are enclosed in double quotes, and leaving strings in single quotes as literals. If you want to use single quotes within the script itself, Perl allows you to instead use the format q/Hello World/ instead of enclosing a string in single quotes, or qq/Hello World/ for double quotes.

Compare the results of these two commands:

perl -e "$a = qq/Hello World/; print $a" # error
perl -e '$a = qq/Hello World/; print $a' # prints "Hello World"

In the first case, the shell attempts to substitute the shell variable $a because the entire string is in double quotes. The example fails because $a is probably not set by the your shell. It will send to Perl the script " = qq/Hello World/; print " causing an invalid syntax error if you use bash or ksh, or if using csh or tcsh your shell will throw an error before sending it to Perl because you are using an undefined variable. In the second example, the shell does not expand $a but correctly sends the literal script '$a = qq/Hello World/; print $a' to Perl. This has the desired effect of setting the $a variable to "Hello World" and then printing it.

When you use perl -pe for a stream operation like perl -pe 's/here is a typi/here is a typo/g' < inputfile > outputfile, Perl allows you to edit the file in place and make a backup of the old version as well.

The -i option of Perl means that you want to edit the file in place and overwrite the original version. Use this option with caution. A safer solution is to use an argument to the -i option to store a backup copy of the original file. For example, if you used the option -i.bak on a file named foo, the new edited version of the file would be foo and the original would be saved as foo.bak.

In effect, you can shorten a command like mv file file.old; perl -pe 's/oldstring/newstring/g' file.old > file to perl -p -i.old -e 's/oldstring/newstring/g' file. Even better, you can do it as a bulk operation on a set of files, like so:

perl -p -i.old -e 's/oldstring/newstring/g' file*

Here's an easy way to change all the strings in all files recursively:

find . | xargs perl -p -i.old -e 's/oldstring/newstring/g'

Perl's -a and -F options help parse a file while you are reading it. -a turns on autosplit mode. When autosplit is enabled, after reading each line of input, Perl will automatically do a split on the line and assign the resulting array to the @F variable. Each line is split up by whitespace unless the -F parameter is used to specify a new field delimiter. These two features simplify parsing when the file is a simple record-oriented format.

Here's a simple one-line script that will print out the fourth word of every line, but also skip any line beginning with a # because it's a comment line.

perl -naF 'next if /^#/; print "$F[3]\n"'

This second one-line script will extract all usernames from /etc/passwd.

perl -na -F: -e 'print "$F[0]\n"' < /etc/passwd

You can use -F/:/ to split on a pattern instead of a string literal. Be careful, because the shell may escape characters preceded by a \ if they are not enclosed in single quotes. To split on whitespace use -F'/\s+/'.

Of course, command line Perl can be used for more general sysadmin tasks in addition to file editing. Let's say you have some HTTP log files named with a date timestamp like access_log.2005-1-1 .. access_log.2005-12-31 and you want to copy them to access_log.old.2005-1-1 .. access_log.old.2005-12-31. Copying the files by hand would be a slow and error-prone operation. You could create a script to do this, but it is a simple enough operation that you can do it with a quick line of Perl.

perl -e 'for(<access_log.*>){$a = $_; s/log/log.old/; `cp $a $_`}'

In this script the <access_log.*> uses Perl's file globbing to create a for loop of all the files. It assigns each one to the variable $_, then executes the loop. The loop first stores the old file name in the $a variable, then changes the $_ variable to substitute log and changes it to log.old. Finally it calls an external cp command to copy the filename from the old filename to the new one. An external cp might be slower than calling Perl's File::Copy module, but brevity prevails in these short code bits when you have no concerns about the script's performance.

In Perl, backticks call an external command and then return the output of that command. This can make your scripts much simpler. Sometimes it is the only reasonable way to accomplish a task. As an example, it is possible to write a Perl script to read the /proc file system to find out which processes are running and what they are doing, but it's easier to capture the output of a ps command and then parse that.

If you know a faster way to get your task done with another program, don't feel obligated to write a pure Perl solution. In many cases, using an external program is the only way to accomplish a task, but wrapping command line Perl around it is good way to automate your task.

The following example copies a script named to machines B, C, and D, then execute it and prints out the results. Using scp and ssh you can do this with a script like this:
perl -e 'for("B","C","D"){`scp $_:`; `ssh $_`; }'.

This works even better if you have public key authentication set up properly.

One of Perl's great strengths is working with other external programs, so use that to your advantage. The goal of using command line Perl is to improve your productivity, not to find the fastest run time or the most elegant solution. Use whatever tools you feel comfortable with.

Command line Perl is useful for tasks that you're going to do only once, so you should choose tools, syntax, and algorithms that are the simplest for you to understand. Sometimes you'll want the full power of a true file editor to help you write a script, but sometimes it is easier to do things with a Perl one-liner.


  • Perl
Click Here!