August 23, 2007

Comprehensive integrity verification with md5deep

Author: Mayank Sharma

Most of the ISO images and other software you grab off the Internet come with a message digest -- a cryptographic hash value that you can use to verify their integrity. While almost all Linux distributions come with utilities to read and generate digests using MD5 and SHA1 hash functions, the md5deep utilities can do that and more.

md5deep computes MD5, SHA-1, SHA-256, Tiger, and Whirlpool digests across Linux, Windows, Mac OS X, *BSD, Solaris, and other operating systems. It can recursively traverse directories, computing sums for files under subdirectories as well. When traversing directories, you can control md5deep to process only files of certain types, such as text files, and ignore other files, such as block devices and symbolic links.

Installing md5deep isn't much of a hassle despite the fact that Linux users have to compile it from source. Grab the latest release (v1.12 at the time of writing), untar it, and as a normal user compile it with make linux. When it's done compiling, switch to the root user and type make install to install the utilities. The program also packs an uninstallation script, and running make uninstall as root will remove md5deep from your computer. For other installation options refer to the README file bundled in the source tarball.

Getting started

Once you've installed md5deep, instead of a single utility you'll get one each for the five digests -- md5deep, sha1deep, sha256deep, tigerdeep, and whirlpooldeep. All the utilities use the same options and behave the same way, so for the examples in this article I'll use the md5deep utility.

In addition to generating hashes for a specified file, if you issue any of these commands without any options, they'll read input from the keyboard and generate sums for that:

$ md5deep
Generate md5sum for this line of text.

The tools can also generate sums for the result of a redirect operation:

$ cat /proc/cpuinfo | md5deep

md5deep utilities also have a time estimation mode. In this mode, each utility will print an estimate of the time left before the hash will be generated. This is useful while generating sums for a big file or a directory with several files:

$ md5deep -e some-linux-distro.iso
some-linux-distro.iso: 1142MB of 1316MB done, 00:00:05 left

Verifying file integrity

Since md5deep can traverse directories, it's popularly used for verifying the integrity of key files. All you need to do is generate hashes for the key files and keep them in a safe, tamper-proof location. Once you have the hashes, a single md5deep command will compare them to the contents of the directory and report files that have changed.

Generating hashes recursively isn't different from generating hashes for a single file, except that you have to tell md5deep to expect multiple files instead of one with the -r option.

$ md5deep -r critical/
c068cc5c6cd4dcb9850261b12de86b86  /home/bodhi/critical/dns-settings
f8966d4413877c07e745b9bb71ad5ce8  /home/bodhi/critical/network-map
2ce4b2aca3d1c74bc41bf18a6ef97409  /home/bodhi/critical/account-info
b9f5405001058039c6ed8acb86b8f0c3  /home/bodhi/critical/lan-config
afdfe519b268b124d423d61f12b990f5  /home/bodhi/critical/failover
d30e5772415619a84a599fcf767cc819  /home/bodhi/critical/raid-setup
c6cc9827ba76acdfcd081e308f56a76e  /home/bodhi/critical/list of ips
80af606c90958628c8c6dda72424f5c9  /home/bodhi/critical/hashes
7bf5a70281d396ba44d211adabb99e9f  /home/bodhi/critical/network-mount-points
3addab70ec87b6f19a3b2dd06a440483  /home/bodhi/critical/proxy-server-details
d6e73d5b7bbc5a832c2853abaaab362c  /home/bodhi/critical/network-partitions
373edcb9b05a72e714ff53b00da50db9  /home/bodhi/critical/bookmarks

Since by default md5deep prints the hashes on the screen, it's also a good idea to send hashes you want to save to a file on a remote box:

$ md5deep -r critical/  >  /mnt/remote-desktop/backup-folder/hashes

Now to check these files against the hashes, run md5deep in its matching mode. In this mode, md5deep will compute hashes of a list of files and compare them against a previous list of hashes. It can then positively match all the files that match the list of known hashes or negatively match those files that do not match the list of known hashes.

For example, let's say the raid-setup file has been modified. To negatively match against the list of hashes to see the list of files that have been modified, run:

$ md5deep -X /mnt/remote-desktop/backup-folder/hashes -r critical/*
1589ac0c9575b2948f3b2f8bdfee24b2  /home/bodhi/critical/raid-setup

As you can see when you compare the second line to the value we got above, the hash of the modified raid-setup file is different from that of the original file. Now what can you do? Many distributions keep a backup of the modified file. You can use the original hash of the modified file from the list of saved hashes to check whether you can find a file on the system that matches this hash.

$ md5deep -a d30e5772415619a84a599fcf767cc819 /home/bodhi/*

VoilĂ ! The file with the trailing tilde character (~) is a backup of the original file and it matches the hash.

But remember that both MD5 and SHA1 are vulnerable to hash collisions -- two different programs can have the same hash. That means that a cracker could conceivably disguise an evil file/application as a backup of the modified file. For critical files, you should use SHA-256, Tiger, or Whirlpool functions which are not vulnerable.

One last useful option of the md5deep tools is its ability to distinguish files based on their type. In expert mode, all the md5deep utilities can isolate and process a particular type of file. It recognizes seven different types, including regular files, such as text, graphics, and executables; block files, such as devices, hard drives, tape drives, and CD-ROMs; character devices, such as /dev/tty; symbolic links; sockets; named pipes; and Solaris door.

Toggle expert mode with the -o option. You'll also need to specify the corresponding letter of the type of file you want to process. The command md5deep -o f /dev -r will process only regular files under the /dev directory, and ignore all symbolic links and character devices. You can also use multiple letters to select files of different types. For example, md5deep -o fl /dev -r will process regular files and symbolic links.

The md5deep set of utilities is a nice addition to your security toolkit. It can verify and generate hashes of the five most popular hash functions. The ability to traverse directories and compare files with previously stored "good" set of hashes makes it all the more useful.


  • Tools & Utilities