June 10, 2010

Getting a Grip on GNU grep

If you've been using Linux for any amount of time, you've probably heard about grep, though maybe you're not familiar with using it. GNU grep is a tool that lets you search one or more files, or standard input. Simple, effective, and absolutely necessary for anyone managing Linux and UNIX-type systems. Want to get a grip on grep? We'll get you started in no time.

The basics of grep are simple: to search for a given pattern, run grep patternfile. This will look through one or more files and return any lines that match the search pattern. Note that grep returns lines of text and the filename, if you're searching through files rather than standard input or the output of another command.

Why We grep

The grep utility comes in handy in all sorts of ways. The most obvious being to search through one or more files for a term. You can quickly and easily dig through logs or configuration files with grep.

But it's also a very useful tool for filtering the output of other utilities or programs. For instance, let's say you want to find all the files installed in /usr/bin by a package that installs a lot of files, you can use one of these:


dpkg-query -L packagename | grep "/usr/bin"

Or if you have an RPM-based system:


rpm -q --filesbypkg packagename | grep "/usr/bin"

Instead of seeing every file that is installed with the package, you'll only see the files that live in /usr/bin. You can also pipe the initial results through grep again to narrow results further.

As with most things Unixy, grep is case-sensitive by default. To search for a pattern and ignore case, use the -i option. This will treat "search" and "SEARCH" the same.

To search through subdirectories using grep, you'll want to use the -r (recursive) option. By default, grep will only search through the top-level directory it's being run in. So if you run something like grep pattern *.html, grep will ignore any directories below the present working directory. Using grep -r pattern*.html will search the existing directory and any subdirectories.

Ignore This

What if you want to find everything but a specific term? You can do that with grep, too. Use the -v option to search for output or files that don't have the pattern.

If you want to know only which files don't have the pattern, but don't need to see the non-matching lines, you can use the -L option instead. This will print a list of files that don't have the matching pattern to standard out.

Conversely, if you want to know which files match — but don't want to see the entire line — then use the -l option, which will show only the names of files that match the pattern. You can combine the options, of course, so you could use grep -v -l pattern * to find out which files do not match a search pattern.

Don't care about the filename and only want to see the lines that match? Use -h to omit filenames in output.

The default for grep is to be greedy with search patterns. By that I mean that grep will match the literal pattern and also longer strings of characters that contain it. So if you grep for "Beat," grep will match "Beat" and also "Beatles," "Beaten," "Beats," and any other string containing "Beat." That's less than useful if you only want to see the exact match.

To limit grep's greediness, use the -w option. This limits grep to the whole word match.


When grepping through log files you might want to see some of the context for lines that match a pattern. Normally, grep will only display the matching line, but it has three options for displaying lines in context so you can see more of the matching files. The options are -A (after context), -B (before context), and -C (context).

To use one of the context options, you specify the option and then the number of lines. For instance, using grep -C 2 patternfiles will return the matching lines (if any) plus two lines of context before and after. The -A and -B options will return any number of lines after or before the matched line, respectively.

Regular Expressions

The grep utility would be pretty useless if you could only use literal strings to search through files and output. What if you want to see if output contains any digit, search for a range of characters, or use a wildcard in search? No problem, you can use regular expressions with grep to tackle just about any situation you'd want to.

Let's look at some of the regular expressions you will want to use. At the shell, * matches any character zero or more times. It's treated differently with grep. You pair * with other terms to match them zero or more times. If you want the same effect as using * at the shell, you'd use grep .* patternfiles instead.

If you want to match a search pattern one or more times, you'd use +pattern to ensure that a given line matches at least once.

The ^ character matches the beginning of a line, and $ matches the end of a line. Let's say you wanted to match all files with an html extension in a directory, but no files with the .bak extension:


ls | grep .*html$

That will omit any files with .html.bak, but match all files that end with .html.

If you want to match a special character literally, you'll need to use single quotes or precede it with a backslash. For instance grep '*' filename will find all instances of the * character in a file, but grep * filename will not match the * character.

There are also bracket expressions that match entire classes of characters. For instance, you can search for A through Z using [A-Z] or any number using [0-9]. In addition, you'll find bracket expressions that match entire classes of characters, like [:punct:] to match any punctuation. You'd use something like this to match any punctuation at the end of a line:


grep '[[:punct:[[$' files
It is necessary to use double brackets here. See the grep man page or manual for more on use of regular expressions. It can be a bit befuddling at 
first, but after you start getting the hang of regular expressions they're extremely useful.

Any Color You Like

You've grepped and grepped until your eyes are blurry, but you're having trouble seeing the string you're looking for. One way to solve this, aside from another cup of coffee and walk 'round the block, is to put a bit of color into the mix.

The --color (or --colour) option will give you this. Add --color=auto or --color=always.

Note that some distributions ship with an alias to enable color support by default. For instance, Linux Mint 9 (which I'm using to write this piece) makes grep an alias for grep --colour=auto. So if you don't want color, you'd need to use --color=off on some distributions.

Color or not, grep is a tool you can't live without if you're a "power user" or system administrator. It's not necessary for day to day use of Linux as a desktop system, though it certainly doesn't hurt to know it. But if you're working with Linux professionally or want to, grep is one of the first tools you should master.

Click Here!