October 31, 2008

Transparent compression of files on optical media

Author: Ben Martin

Support for transparent decompression of files on optical media has been part of the Linux kernel since version 2.4.14. Here's how you can take advantage of this support when you burn your own optical media by using the mkzftree tool and the -z option to genisoimage. These commands compress files using zlib, which uses the same algorithm as gzip. Using the transparent compression Rock Ridge extension can allow you to fit much more data onto a DVD.

Trying to use transparent (de)compression will not gain you anything if the information you want to burn is already well-compressed. For example, trying to compress a disk containing JPEG images, OGG audio, or bzip2 compressed tarballs will probably result in the ISO image becoming slightly larger when you try to use transparent decompression.

One area where the transparent decompression shines is if you are burning a disk containing HTML or plain text files. Human language stored in UTF-8 format can be compressed efficiently, and the Linux kernel, using the transparent decompression Rock Ridge extension, hides the fact that you are using compression from applications reading the disc.

On a Fedora 9 system, you have to install genisoimage and zisofs-tools to get the genisoimage and mkzftree tools respectively. On openSUSE 11 the packages are available as 1-Click installs: genisoimage and zisofs-tools. On Ubuntu Hardy all the needed tools are in the genisoimage package.

I'll use the HTML Linux HOWTO files to demonstrate how the transparent compression Rock Ridge extension works and what can be gained. The tar.bz2 for the HTML HOWTO files is 16MB, and it takes up 72MB when expanded onto an XFS filesystem. When you turn the HTML HOWTO directory into an ISO image with K3b using the Filesystem type "Linux/Unix + Windows," you should get an image file that is 66MB. We can do better than that by using transparent decompression.

With the below commands I generated two normal ISO images using the HTML HOWTO file tree as input.

$ genisoimage -o HTML-normal.iso HTML
$ genisoimage -o HTML-rockridge.iso -r HTML

The genisoimage tool does not actually perform compression itself -- for that you have to use the mkzftree tool. Once you have a tree with compressed files you have to use the -z option to genisoimage to turn on the transparent compression Rock Ridge extension so that the operating system reading the disc knows that it should decompress some of the files on the disc. You must also enable Rock Ridge for genisoimage using an option like -r.

$ mkzftree -p 4 --one-filesystem HTML HTML-compressed
$ genisoimage -o HTML-rockridge-compress.iso -r -z HTML-compressed
Warning: using transparent compression. This is a nonstandard Rock Ridge
extension. The resulting filesystem can only be transparently
read on Linux. On other operating systems you need to call
mkzftree by hand to decompress the files.

mkzftree can take other arguments. With the --parallelism (-p) option you can set many compression tasks to run at the same time. The utility also supports a collection of options like --one-filesystem for dealing with filesystem boundaries. You can also specify the input and output directories where you would like mkzftree to create a directory to store the compressed output.

The size of the ISO files created by the above commands is interesting.

$ ls -lh *iso
-rw-rw---- 1 ben ben 64M 2008-10-21 11:45 HTML-normal.iso
-rw-rw---- 1 ben ben 33M 2008-10-21 11:56 HTML-rockridge-compress.iso
-rw-rw---- 1 ben ben 65M 2008-10-21 11:46 HTML-rockridge.iso

The first two ISO images do not use compression and generate an ISO that is roughly the same size as the source files, but the ISO containing compressed files is about half the size of the standard ISO -- thought still twice as big as the original compressed tarball. The ISO is larger because while the tarball is compressed as a single archive file, each file in the ISO is split into 32KB blocks that are individually compressed. In general, the larger the blocks of data that you are compressing, the better compression ratio you can achieve. mkzftree has to compress files into many blocks so that the Linux kernel can still allow random access to any parts of the compressed files.

You can use the --crib-tree option to mkzftree if you want to add a few files to an existing compressed tree. Some versions of the manual page for mkzftree list --crib-tree as --crib-path, but that latter option does not work. Adding a file is fast. To create the HTML-compressed directory as we did above took about 2.5 seconds. Adding a single simple file to the tree and using --crib-tree to generate a new tree with the commands below took only 0.4 seconds.

$ date > HTML/df1.txt
$ time mkzftree -p 4 --one-filesystem \
--crib-tree HTML-compressed \
HTML HTML-compressed_t5 2>/dev/null

One word of caution: transparent compression is a nonstandard Rock Ridge extension. You'll have wide support for reading your media on almost all Linux systems, but if you want to read a compressed file on another platform you might have to use the mkzftree tool with the -u option to decompress the files. Of course, you might have to compile mkzftree for your other platform first.

Category:

  • Tools & Utilities
Click Here!