Linux.com

Home Learn Linux Linux Documentation More Great Linux Awk, Sed, and Bash Tips and Tricks

More Great Linux Awk, Sed, and Bash Tips and Tricks

Awk and Sed are powerful text processors that run circles around bloaty word processors. We're going to use them to customize the Bash prompt, add and remove line numbers, insert commas in long numbers, and perform all manner of experiments without endangering our source files.

Awk and Sed are brilliant text processors, and as you learn more ways to use them the less you're going to find yourself using a word processor. Word processors, in my sometimes-humble opinion, are great lumbering things all full of buttons and menus, and good luck finding what you want -- or if it even exists. They're much too "helpful", to the point that I yell mean things at them and order them to get out of my way. As they don't understand voice commands it's not very effective. The simplest use of Awk is sorting words and numbers that are separated in some way, by spaces, line breaks, commas, and other punctuation-- anything that can be used as a delimiter. Awk's boon companion, Sed (stream editor) operates on individual characters. Sed even has a sense of humor-- this example changes your good Linux prompt to a DOS prompt:

$ export PS1="C:\$( pwd | sed 's:/:\\\\\\:g' )\\> 
C:\home\carla\> 

This is not permanent, and will go away when you close your terminal. While we're messing with Bash prompts, let's fix it so that when we log into a remote PC the prompt turns red and says "ssh-session" so we know for sure it's a remote session. Add these lines to your ~/.bashrc on the remote machine:

if [ -n "$SSH_CLIENT" ]; then text=" ssh-session"
fi
export PS1='\[\e[0;31m\]\u@\h:\w${text}$\[\e[m\] '

Log in from a remote machine to test it, and you'll see something like figure 1. (The Bash prompt is extremely customizable with all kinds of colors and information; see the Bash Prompt HOWTO to learn all the color and customization codes.)

red-ssh copy

Suppose you're writing a code example and you want to insert printable line numbers. Some editors do this, some don't. This is how Awk does it:

$  awk '{ print FNR " "":"" " $0 }' /bin/cgroups-mount
[...]
7 :# For simplicity this script provides no flexibility
8 :
9 :# If /sys/fs/cgroup is mounted, we don't run again
10 :if [ -n "`grep /sys/fs/cgroup /proc/mounts`" ]; then
11 :    exit 0

I threw in some nice spaces and colons for prettiness. You can also display line numbers with the less command:

$ less -N /etc/ardour/ardour.menu
[...]
      7  <menuitem action='New'/>
      8  <menuitem action='Open'/>
      9  <menuitem action='Recent'/>
     10  <menuitem action='Close'/>
     11    <separator/><

I use this when I have to wade through XML files, which to my eyes are giant undigestible snarls, even with color syntax highlighting. Now suppose you have some example code copied from somewhere with line numbers, and you need to get rid of the line numbers; Sed is perfect for this task:

$ sed "s/^ *[0-9]* //g" filename

These examples print the results to stdout and do not change the source files, which is a nice safety mechanism. You can create a new file containing your changes with a simple redirect, like this:

$ awk '{ print FNR " " $0 }' /bin/cgroups-mount > newfile

Sed can edit files in place with the -i, so if you're really really sure you can edit your source file directly. This example inserts commas into a file full of columns of long numbers:

$ sed -i ':a;s/\B[0-9]\{3\}\>/,&/;ta' numbers.txt

So this:

20130607215015
607220701
992171

Becomes this:

20,130,607,215,015
607,220,701
992,171

A good learning tool is to look up the command options in the man pages. To learn more about these wonderful commands try the "Definitive Guide to sed: Tutorial and Reference" by Daniel A. Goldman, which is the first new Sed & Awk book in years, and it's very good. A good companion book is "Introducing Regular Expressions" by Michael Fitzgerald, because regular expressions are essential to pretty much everything in scripting, programming, and many Linux commands.

 

 

 

Comments

Subscribe to Comments Feed
  • thomas connelly Said:

    UNIX having theidea that no news is good news does allow a person to shoot themselves in the foot! so i particularly liked the setting the prompt to let user know they are in a ssh session, might also be useful to let a person know when the have changed user. so many prompts, so little time...

  • Kevin Havens Said:

    I did that, make my bash prompt look like an old MS-DOS command.com C:\> prompt. Posted it to my Facebook page. Dunno how I can post it here, though... brings back memories...

  • David Dreggors Said:

    I see people give examples all the time like: cat myfile.txt|grep "some text"|awk '{print $2}' This is so bad because it is unnecessary usage of extra commands. cat is used to pipe the contents of the file to grep, only to have grep only give the matching lines to awk, which then only prints the second field (separated by spaces). We can remove the cat command altogether: grep "some text" myfile.txt |awk '{print $2}' and still it is too much, there is an unneeded grep command! You can remove cat and grep as follows: awk '/some text/{print $2}' myfile.txt To explain: awk reads the file (myfile,txt) and says only for the lines matching "some text" (/some text/ is a regex match) print the second field ({print $2}). Please google for the "useless uses of cat" to see more of these great examples some time. :-)

  • Mat Enders Said:

    Your first example does not work as is. The close quote is missing.

  • Matt Said:

    The problem with "useless uses of cat" is that they completely ignore the way people naturally work their way through command pipelines. awk scripts do not spring fully formed from people's heads. They are an exploratory process. For 99.9% of all commands run with a "useless cat" the cat isn't getting in the way, and the time saved editing command lines from priors that have been used to build up the command far exceeds the CPU time spent executing an extra process rather than jamming it all into awk. There is more than one way to skin a cat. Don't admonish people for doing things differently than you if the result is just as correct. If you're into that sort of thing, then you would be better off spending your time arguing that Perl shouldn't exist.

  • Mixed hamtrope Said:

    While cat and grep make the command more accessible for those learning to script, placing the search term within awk does seem good practice IMHO. BTW is it possible to put a $variable in that place in an awk expression?

  • Daniel Goldman Said:

    It's an honor and very kind of Carla Shroder to mention the book "Definitive Guide to sed". The website is http://www.sed-book.com/ for anyone interested. Carla has written THREE books: "The Book of Audacity", "Linux Networking Cookbook", and "Linux Cookbook". That is a huge amount of work, and all three books have very high reviews. Yes, awk and sed are brilliant text processors. People often seek something new. But for what they do, I don't think there is anything better. They both have learning curves, and the existing learning resources seemed pretty outdated and not good enough, that's why I wrote the book. I think one key to learning sed is to concentrate on the simpler sed usages, avoiding complex, obscure scripts. I think it's best to have many tools in your belt, and use special purpose unix commands (eg, grep, tr, head, tail), with greater power for their particular task, whenever possible. The original UNIX philosophy was to chain together small utilities (including sed), and that still works best, IMO. Related to avoiding complex scripts, the example to insert commas is a great teaching example for many aspects of sed, and quite interesting. But for practical use, I would use the following within a shell script, which at least works with bash: $ printf "%'d\n" 12345678 12,345,678 The sed version is much shorter than the total shell script, and not dependent on the locale. But the bash version is easier to understand. I'm actually surprised there is not some little UNIX utility that just inserts commas, no matter what the number. It could be called "commas". I strongly agree about the importance of regular expressions. A solid understanding of regular expressions is really required to master sed, IMO. And the regular expression knowledge applies to other areas, eg vi and grep. "Definitive Guide to sed" emphasizes regular expressions. Thanks again, and I hope these comments are helpful. Daniel Goldman

Upcoming Linux Foundation Courses

  1. LFD320 Linux Kernel Internals and Debugging
    10 Nov » 14 Nov - Virtual
    Details
  2. LFS426 Linux Performance Tuning
    10 Nov » 13 Nov - Virtual
    Details
  3. LFD312 Developing Applications For Linux
    17 Nov » 21 Nov - Virtual
    Details

View All Upcoming Courses

Become an Individual Member
Check out the Friday Funnies

Sign Up For the Linux.com Newsletter


Who we are ?

The Linux Foundation is a non-profit consortium dedicated to the growth of Linux.

More About the foundation...

Frequent Questions

Join / Linux Training / Board