July 29, 2011

Weekend Project: Get to Know GNU Sed

If you've ever needed to edit one or more files to make quick changes, you've no doubt found that doing it using a text editor can be a slow slogging process. Linux, thankfully, has a number of tools that make it easy to do this non-interactively. One of the best is sed, a "stream editor" that can help you make quick work of filtering and transforming text. This weekend, take a few minutes to introduce yourself to sed.

Like many of the utilities you'd use on Linux, sed originated on UNIX for processing text files on the command line or by shell scripts. The implementation of sed that you're using on Linux is likely GNU sed, but that's far from the only implementation. If you're using one of the BSDs or Mac OS X, you'll run into different versions of sed. If you use a system that uses Busybox, you may be using a different version of sed. For the most part, sed should act the same across platforms — but the GNU version does have options that may not be present with other implementations or may not have the same behavior.

So what is sed good for, and what do we mean by "stream editor"? Standard text editors open a file in a buffer and you edit the file as one big blob of text. (I'm generalizing a wee bit, of course.) When you're using Vim, Gedit, Kate, or whatever your favorite text editor happens to be, you're using it interactively. You move around the file a bit, make edits, run commands, etc. Sed, on the other hand, is non-interactive. You pass sed the editing commands and it works on files or a stream of text that is output by other programs.

You give sed a range of addresses and set of commands, and it goes to town on the input — whether that's files or text coming from another process. When I say "addresses," that's text editor jargon for "position in the file." For instance, lines 10 through 101 in a file. If you don't give sed specific addresses, then it just assumes you want it to work on the entire file or output.

What are commands? You'll find that sed has a number of commands, but the most common ones that we'll focus on here are substitution, deletion, and printing.

Sed in Practice

That might be a bit abstract, so let's look at using sed in some simple but real scenarios where it might be beneficial. First, let's look at the basic syntax for sed:

sed optioncommand;command2filename

Like most utilities, sed takes one or more options, then commands, and then the filename that you're working on. As you can see, you can use multiple commands, and we'll look at that as well.

Say you want to quickly skim through an Apache logfile and see all of the instances where an Atom feed was requested. Looking through my site logs, I see that there's a lot of lines with GET /?feed=atom, and I'd like to see those and work with them a bit. Let's use sed's print command:

sed -n '/atom/ p'

Here you're giving sed the "quiet" option (-n) and telling it not to print anything unless specifically asked. Then you'll do a search for the string "atom" (/atom/) and then tell sed to print (p) lines that match. You might wonder, why the single quotes? That's to prevent the shell from interpreting characters like &. We want to pass those on to sed, not have them interpreted by the shell, because we'd get rather unexpected behavior then.

Let me note up-front that sed isn't the only utility that can do this. You could easily use grep for this, too. But does grep have magical search and replace powers? It does not, but sed does.

What if you want to do a little quick and dirty text replacement without having to open up Vim? Here's a simple example. I often clean up HTML before posting stories on Linux.com. One thing I usually do is to substitute the em dash (—) for two dashes (--). To do this, run:

sed 's/ -- / \— /g' filename.html > newfile.html

If you've been using Vim and think that looks really familiar, you're right. Here use the substitute command (s) for two dashes surrounded by spaces, and then replace them with the — element. As with Vim substitutions, the g tells sed that the search is global. Without that, sed would only attack the first instance of the search pattern per line, and ignore other instances.

The last argument for sed is to tell it which file to parse, and then we redirect the output to newfile.html. What happens if we don't redirect it? Then sed just spits it out to the standard output, so you'll see a bunch of text spit out to the terminal.

You can modify a file in place with sed, if you're sure of what you're doing. It's a hack, but here's how it works: the -i option tells sed to make a backup of a file and edit in place. The idea is that sed never modifies a file without creating a backup. This is smart design — it keeps you from making an edit to a file that horks the file and is unrecoverable. Remember sed is non-interactive, therefore it has no undo. None. But at times you might want to edit a file in place anyway, so here's how:

sed -i'' -e's/foo/bar/g' filename

That gives sed an empty expression to use as a backup filename. Normally you'd use something like -i'.bak'. Note that you do not want a space between -i and the expression.

The -e option tells sed that what follows next is a script or expression to evaluate. Again, no space follows the -e and the expression, or sed will assume that what follows next is a filename. Yes, it can be a bit picky.

Let's take a look at the d (delete) command and address ranges. Let's say that you want to delete lines 10 through 100 in a file:

sed -i'' -e'10,100d' filename

That tells sed to edit the file in place, then to delete the range 10 through 100. Again, if this range looks familiar, it's because it's the same syntax used with Vim.

What if you wanted to print the range instead? Remember, we only want the range specified, so we want the -n option, like so:

sed -n '10,100 p' filename

Just the Beginning

Make sense? What we've learned about sed so far: how to substitute text, print text, and delete text from a file.

This is just the tip of the iceberg, not only of what you can do with sed but also of the text processing powers of utilities on Linux. There's more ground to cover with sed, Gawk, and more. We'll be covering more of sed soon, but in the meantime be sure to check out the man page and look over this massive sed tutorial. Next up? We'll look at sed regular expressions and how to use them.

You might also want to check out the tutorial on GNU utilities and the learning GNU text utilities tutorial. Both of these will help you with some of the basic GNU utilities you might use to process text.

Click Here!