February 24, 2010

Getting to Gno GNU Utilities


The GNU Project has provided dozens of useful utilities that you can find on almost every major Linux system, but many new Linux users have no idea where to start to learn these handy utilities. In this tutorial, I'll cover a few of the utilities that you can use to measure file system usage, verify the size of files, and take a peek into larger text files like Apache logs.

Virtually ever major Linux distro comes with these utilities installed. Some distros designed for resource constrained systems might make use of BusyBox, which includes replacements for most of the GNU utilities. In that case, you should have the same utilities, but they may lack features found in the GNU utils or have slightly different options, etc. However, if you're using the mainstream distros like Fedora, Debian, Ubuntu, openSUSE, Mandriva, Slackware, etc., you should have the standard utils from the start.

Understanding GNU Utils

The GNU utilities provide the basic tools for working with files, text, and shell utilities that one would expect on a standard Unix-like system. This includes everything from tools to manage files (ls, cp, dd, and so on) to text manipulation (sort, tail, head, uniq, and the like), and shell utilities that provide much of the functionality needed to keep a system happy and healthy.

Historically these were broken up in to three collections of tools: textutils, fileutils, and shellutils. Now they're simply distributed as the "coreutils" package. Each utility has its own set of options and documentation, but the suite has a few common options and relatively standard usage that every user should know.

If you need to know the version of the program you're working with, the --version option after the utility should return the version and licensing information. (Hint, if you see "GPLv2+" then you've got an older version.) You should also see the authors of the program as well. For instance, here's what you'll see if you check ls --version:

ls (GNU coreutils) 7.4
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard M. Stallman and David MacKenzie.

More importantly, each utility will have a --help option. So if you're unsure about the usage of a given utility, you can ask for it to display the help information, which will spit out a terse list of the options and some general info about the utility. But far from a complete manual. If you want a much more detailed guide, try the info or man page by running man ls or info coreutils 'ls invovation'. That's a bit of a handful for info, but the info pages are the preferred format for the GNU Project.

The GNU folks have created too many useful and essential utilities for me to cover in one tutorial, so I'll tackle some of the must-haves here, and we'll revisit the topic in coming weeks. Rather than starting with the absolute basics like ls, cp, that you may have seen elsewhere, I'm going to show how to use a few utilities that newcomers to Linux might not have encountered right away.

Creating and Checking File Checksums

The first utils we'll look at are md5sum and sha1sum. These utils compute and verify checksums for files. If you've ever downloaded a Linux ISO, packages, or source code, you've probably noticed that most projects include links for md5 and sha1 files. These files contain a checksum for the file, which helps you verify whether or not the files are the same as they are on the download server. It's a good idea to verify this, for a few reasons:

  • Make sure that the file download is uncorrupted, so you won't waste a blank CD or DVD (in the case of ISOs), or run into problems installing a package.
  • Ensure that the package or ISO is what it's supposed to be: A compromised file should have a different checksum than the original.

One note: If a site is compromised, an attacker can put up a checksum that matches a file that's been tampered with. So you're no better off in that case, but most of the time you're going to be grabbing a file from a mirror — meaning that the project's checksum should help protect you from tampered files.

Most projects offer MD5 sums, but some projects offer SHA1, or both. Without going deeply into the differences, an MD5 sum is the older format that provides reasonably decent protection — but it has been demonstrated that "collisions" can be deliberately created for MD5 sums. A collision is when two files have the same hash, even though they're different. SHA1 is considered more secure.

To check a file's sum, simply run md5sum filename or sha1sum filename at the command line and you'll see the file's sum printed next to the filename, like so:

jzb@neelix ~/ $ md5sum filename.img
b4a6bc833c6f719c1980bbe6f3f152d6  filename.img
jzb@neelix~/$ sha1sum filename.img
9f644a2b9fb10f16515e7597418da3c321453268  filename.img

The larger the file, the longer this will take to run. Larger files (like DVD ISOs) may take several minutes, especially on slower systems. If you have a file with the MD5 sum or SHA1 sum and filename in the same format as shown above, you can use the -c option to verify the files against. So you can run sha1sum -c filename.txt in the same directory where the file is located, and either get an OK message, or something like this:

filename.img: FAILED
sha1sum: WARNING: 1 of 1 computed checksum did NOT match

As mentioned, it's a good idea to verify any files that you download before trying to burn to CD or DVD, or before installing a package, etc. It takes a few seconds more, but it's worth it. Some of the programs for Linux that burn CDs, such as K3b, will evaluate the MD5 sum before burning an ISO to disc. All you have to do in that case is simply verify the file against the MD5 sum you have already.

Quick Peek into Text Files

The next two utilities are really useful when working with logfiles or any long text file when you don't want to open the entire file. For example, I may want to see the last few entries in an Apache log or some other logfile, but not want to open the entire file because it's hundreds of megabytes in size (or larger). In that case, turn to head and tail. As the names imply, head will display the beginning of a file, and tail displays the end of a file.

Here's how they work: by default, running tail filename will display the last 10 lines of a file, and head filename will display the first 10 lines of a file. Pretty straightforward, right? What if you want to see more than 10 lines? Simple! Use the -n option to tell the utility how many lines you want to see. For instance, if you want 20 lines, use tail -n 20 filename.

But what if you want to see something in the middle? There's a way to do that as well by combining the tail and head utilities. To display lines 50 through 60 of a file, run head -n 60 filename | tail and you'll automatically see the last 10 lines of the 60 lines of output from head: i.e., exactly the lines you wanted to see.

Another handy trick with logfiles, in particular, is using the -f (follow) option. This allows you to see new lines as they're printed to a file without having the file open and preventing another process from writing to it. Using the Apache logfile example, run tail -f access.log to see new lines as they're added to the file access.log. This can be very useful when troubleshooting.

Measuring Disk Usage

Another couple of handy utilities are df and du, short for "disk free" and "disk usage" (or at least that's how I remember them). These utilities let you know how much space is being used or is available on a disk.

To see how much space is free on a system, use df -h. The -h option tells it to provide information in "human readable" format — namely, in MB or GB instead of 1K blocks. This will give a report for all mounted file systems. You'll also see the approximate amount of space used, the percentage of space used and where the file systems are mounted, like so:

jzb@neelix ~ $ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              19G  5.7G   13G  32% /
udev                  1.5G  220K  1.5G   1% /dev

As you can see, I have one partition that's mounted as / and only 32% is in use. Note that df doesn't report the use of the swap partition.

Using du -h will tell you the size of all the files in the current directory and the total usage. If you just want the total usage in a directory, use du -ch.

Now, there's a common question around these tools — how can I get two different numbers when trying to find out how much space is used, versus how much space is free? That is, why don't df and du agree on how much space is used or available? Simple. The du utility reports the space usage of the files on a disk. The df utility reports the space available to files, and looks at slightly different information to get that. When a file system is created, some overhead is left for the system administrator, some space is taken up by the filesystem journal when using a filesystem journal, and so on. Another thing taken into account by df is the space taken by files that are deleted but not yet marked to be overwritten, but du doesn't "see" those files.

The df utility will give you a more accurate view into how much space you have available, and du will let you know space consumed by files.

Note that you can use du on a single file as well, in case you'd like to know the size of a specific file. You can also use du on multiple files, so if you wanted to see how much space was taken up in a directory just by JPEGs or ISOs you could run du -ch *iso or du -ch *jpg and get a grand tally (-c) of the disk usage of those files.

These are just a few of the utilities that are available from the GNU core utils package. Next time, we'll cover several of the text utilities that you can use to work with and manipulate text files with. Enjoy!

Click Here!