June 1, 2004

CLI magic: sort of bragging

Author: Joe Barr

The Linux GUIs are -- in my opinion -- on par with those of Windows or the Mac for ease of use, though I don't think any of the Linux GUIs are as pretty as that of the Mac, or even Windows. Since pretty is not my top priority when choosing an operating system, I can live with that. But when you compare the command line environment of DOS/Windows with that of Linux, it's like comparing a wheelbarrow with an 18-wheeler. This week we'll look at the sort command, and a couple of options you may not be familiar with.

Assume that we start with a small file containing names, phone numbers, and email addresses. Let's call it names.txt and enter the following lines in it.

Jim Clark 210 555-1212 jclark@sanantonio.com
Brenda Sweet 512 555-1212 bsweet@austin.net
Bobby Jones 214 555-1212 bjones@dallas.org
Tubby Ames 713 555-1212 tames@houston.com
Mac Davis 800 555-1212 mdavis@frisco.com

If we have 50 or 100 names or more in our names.txt file, it would be very helpful to have it sorted by name when we want to call or email someone. No problem. Enter the following at the command line:

sort names.txt

The file is sorted based on the contents of each line, starting with the first character of the first name and continuing through the end of the email address. The output on the terminal screen (that's where sort writes its output unless you tell it differently) would look like this:

Bobby Jones 214 555-1212 bjones@dallas.org
Brenda Sweet 512 555-1212 bsweet@austin.net
Jim Clark 210 555-1212 jclark@sanantonio.com
Mac Davis 800 555-1212 mdavis@frisco.com
Tubby Ames 713 555-1212 tames@houston.com

If you want to retain the sorted version of the file you can redirect the output of the command to a new file (in this case named names.srt) like this:

sort names.txt > names.srt

But what if you want the list sorted by last name instead of first? Sort can do that too, because it allows you to define the position of the sort key. According to man sort, we can specify the field and the character position within the field to start and end the sort key. If no key position is specified, then the entire line is used, as noted above.

To sort the file by last name, enter:

sort -k2.1 names.txt

To sort by email address, the command would be:

sort -k5 names.txt

Note that the definition of "field" in this context would be text delineated by blank characters. Logically, you might consider "Joe Blow" to be the name field, but as far as sort is concerned, it's two fields.

But guess what? You can change that too. If you want to use the plus sign (+) instead of a blank to define fields, just add the -t option to define it. Let's make our original file look like this by separating the name, phone number, and email address data with plus signs:

Bobby Jones+214 555-1212+bjones@dallas.org
Brenda Sweet+512 555-1212+bsweet@austin.net
Jim Clark+210 555-1212+jclark@sanantonio.com
Mac Davis+800 555-1212+mdavis@frisco.com
Tubby Ames+713 555-1212+tames@houston.com

That done, let's sort on the email address again, which in our redefined field structure is now field 3 instead of field 5, by entering sort -t+ -k3 names.txt at the command line. That produces a file that looks like this:

Bobby Jones+214 555-1212+bjones@dallas.org
Brenda Sweet+512 555-1212+bsweet@austin.net
Jim Clark+210 555-1212+jclark@sanantonio.com
Mac Davis+800 555-1212+mdavis@frisco.com
Tubby Ames+713 555-1212+tames@houston.com

As always, go over man page carefully. There is more magic to the sort command than we've covered here: reversing the collating sequence, merging files, and more.

Click Here!