June 13, 2005

CLI Magic: Hexdump unlocks mysteries

Author: Joe Barr

The hexdump utility may not have been designed with an education goal in mind, or as an exploring tool, but it does a great job at both. If you have ever been curious about what mysteries lurk within an executable binary, a word processing document, or even an image, hexdump can help you scratch that itch. As a side benefit you will gain familiarity with the hexadecimal or base 16 numbering system, which is often referred to simply as hex. Let's pop the hood on the CLI and take a look.

To get started, try generating a hex dump of the file you want to examine without any syntax options, using a command like hexdump/usr/bin/hexdump.

The first few lines of the output should look something like this:

0000000 457f 464c 0101 0001 0000 0000 0000 0000
0000010 0002 0003 0001 0000 8b30 0804 0034 0000
0000020 3ce8 0000 0000 0000 0034 0020 0008 0028
0000030 001c 001b 0006 0000 0034 0000 8034 0804
0000040 8034 0804 0100 0000 0100 0000 0005 0000
0000050 0004 0000 0003 0000 0134 0000 8134 0804
0000060 8134 0804 0013 0000 0013 0000 0004 0000
0000070 0001 0000 0001 0000 0000 0000 8000 0804
0000080 8000 0804 3738 0000 3738 0000 0005 0000

Notice that the left column is longer than the others. It reports the offset, in bytes, of the data to its right, all of which is expressed in hexadecimal. In other words, the data represented by "457f 464c 0101 0001" on the first line is offset zero bytes from the start of file -- which in this case is the hexdump executable being displayed.

Moving down to the third line, you can see that it is offset 32 bytes, because 20 in hexadecimal is the equivalent of decimal 32.

Note that there are 32 characters printed on each line, and that these 32 characters represent 16 bytes of data. This is because you are looking at the hex value of each byte. Remember that a byte is 8 bits long, so there are 256 possible binary values, ranging from decimal 0 to 255, for any given byte of data. Because there aren't 256 printable characters available for us to represent all 256 values using only a single character, hexdump converts each one to hex, where the entire range can be represented in only two characters, ranging from 00 to FF. Each hex number equates to a 4-bit value or half a byte. Hence, 32 characters to represent the 16 bytes of data.

If you are looking for human readable data in the file, converting everything to hex is just going to make the task more difficult. A possible solution is to use the -C option, as shown in the following example:

linux:~> hexdump -C/usr/bin/hexdump
00000000  7f 45 4c 46 01 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 03 00 01 00 00 00  30 8b 04 08 34 00 00 00  |........0...4...|
00000020  e8 3c 00 00 00 00 00 00  34 00 20 00 08 00 28 00
|.

Now we have the same number of lines of data as before, but the display has changed considerably. Our offset remains the same but the data is displayed differently. For one thing, it is grouped two characters at a time instead of four. More importantly, an entirely new field has been added.

Look at the field delineated by the two | characters. This is where the printable ASCII characters in the data are shown. Unprintable characters are simply noted with a period for a placeholder.

Note the "ELF" in the beginning of the ASCII display; it identifies the file as being an Executable and Linkable Format object file.

If you want to browse through the binary file one screen at a time, add | more to the end of the command shown above. If you already know that there is interesting data at a particular offset into the file, you can tell hexdump to skip everything before that point by using the -s offset argument. Hexdump expects the offset to be in decimal, so if you want to provide it in hex format, you have to precede the offset with 0x.

For example, assume you know that hexdump error messages begin to appear in the file at the hex offset of 33f0. To start the display at that point and set it up so you can page through the output a screen at a time, enter the command:

linux:~> hexdump -C -s 0x33f0/usr/bin/hexdump | more

Here is what the first screen should look like:

000033f0  33 34 35 36 37 38 39 00  68 65 78 64 75 6d 70 3a  |3456789.hexdump:|
00003400  20 62 61 64 20 66 6f 72  6d 61 74 20 7b 25 73 7d  | bad format {%s}|
00003410  0a 00 68 65 78 64 75 6d  70 3a 20 6c 69 6e 65 20  |..hexdump: line |
00003420  74 6f 6f 20 6c 6f 6e 67  2e 0a 00 68 65 78 64 75  |too long...hexdu|
00003430  6d 70 3a 20 63 61 6e 27  74 20 72 65 61 64 20 25  |mp: can't read %|
00003440  73 2e 0a 00 68 65 78 64  75 6d 70 3a 20 62 61 64  |s...hexdump: bad|
00003450  20 62 79 74 65 20 63 6f  75 6e 74 20 66 6f 72 20  | byte count for |

OK, that is more than enough to get you started exploring with hexdump. You may have come in here straight from the GUI, not knowing a byte from an Apple, but look at you now. Here you are strutting through binaries, talking in hex, noting the format of an executable, and, if you were really paying attention, learning what all that Wrong Endian talk is about.

Click Here!