June 30, 2005

Bringing back LZW compression

Author: Nathan Willis

History quickly forgets vice presidents, "Best New Artist" Grammy winners, and compression algorithm patent disputes. Five years ago, Unisys ignited an uproar by changing its stance and demanding royalties on two patents it held on the Lempel Ziv Welch (LZW) compression algorithm. Consequently, LZW compression support was rapidly and ruthlessly cut from virtually all free software. As of July 2004, both patents had expired, but so has any resurgence of interest in LZW compression. It continues to remain absent for the majority of Linux users. Fortunately, it's easy to restore.

Most people remember this ruckus because LZW was used in GIF images -- the BurnAllGIFs campaign sprang up to kill the GIF format and replace it with PNG. For the most part the campaign was a successful one, considering that the PNG format is now widely supported by both Web browsers and image editing applications. GIF was weak and hobbled and few mourn for it today.

However, GIF is not what LZW is all about. LZW has other more vital uses, chief among them being that it is the default compression algorithim for the Tagged Image File Format (TIFF) file format. Despite the similarity in their names, TIFF is not merely GIF's smarter handsomer brother. TIFF is the de-facto standard for lossless image storage.

TIFF can handle multiple bit-depths, color-spaces, and encodings, multiple "pages" per file, integer or floating-point data, and a variety of other useful extensions. The TIFF specification is controlled by none other than Adobe, the unquestioned, usually benevolent dictator of the world's image formats. If you want to add your own extension to the TIFF format, all you have to do is email Adobe.

As the LZW debacle grew, various compression schemes were added via TIFF extensions, including Zlib (a.k.a. Zip) and JPEG compression. The latter does seem counter-productive, but it's a free world.

However, thus far none of these alternative compression schemes has made it into the official TIFF spec. One reason might be that the TIFF spec has not been updated since 2002. Furthermore, TIFF revision 7.0 has been "in the works" since 1992. Thus, support for TIFF extensions is spotty and inconvenient. It is all well and good that you can use Zip compression to save some space in your photo archives at home, but if you want to upload some of those photos to a printing service, you are going to have to convert them.

You are also unlikely to find alternative compression schemes supported by smaller devices such as portable "media drives" and direct-from-file printers. Last but not least, between the two lossless compression schemes, Zip takes significantly more computational overhead and thus more time.

Missing pieces

But even in its absence, LZW was never completely forgotten. Free software's premiere GUI graphics application (the GIMP) and premiere command-line graphics application (ImageMagick) were always able to read LZW-compressed files; they just weren't able to write them. And the apps did keep the proper hooks in place so that any user who wanted to do so could re-enable LZW compression (be they rebel, Unisys-royalty-paying-licensee, or in international waters).

Keeping the proper hooks in place was a friendly gesture, though it did mire GIMP 2.x users in a frustrating dilemma: LZW is not just among the Save As dialog options for TIFF, it is the auto-selected default and cannot be changed through any user-accessible setting or preference. That means GIMP 2.x users have to manually select a supported compression option every single time they save every single TIFF file, and forgetting to do so causes the program to fail with a cascading series of error message warning boxes.

But I digress. The reason that the GIMP and ImageMagick advertise this false hope to LZW-loving TIFF fans is that they both depend on the open source libtiff library for reading and writing the format. The GIMP is merely blissfully unaware of whether the libtiff it calls on for the file save is LZW-enabled or not.

The libtiff developers quite ingeniously moved the LZW compression function to its own file as the Unisys royalty issue began to gather steam. In doing so, they could then distribute the 3.5.5 edition of the library with an empty "stub" file where the LZW routine would have resided. Furthermore, they could also still supply a working LZW function separately, for those who needed it, which could be implemented with a single patch to the library.

As of libtiff 3.7, the LZW compression function has been restored in the main distribution. There was some debate last year over whether IBM also held a patent on the LZW algorithm, but it has been settled -- IBM's patented compression is a cousin to LZW, but different. They are both derivatives of an earlier algorithm called LZ78. How multiple derivatives of existing compression algorithms can be ruled innovative enough to be patentable is probably a question best directed to the United States Patent Office or your nation's equivalent.

Unfortunately, popular Linux distros such as Fedora, Ubuntu, Slackware, Debian, and Mandriva are not shipping libtiff 3.7. According to its Web site, Novell does include 3.7 with its high-end SUSE Professional product, but not its Desktop offering.

The first step to restoring LZW support on your system is to see what version of libtiff
is already installed. For most of us it is 3.6.1. This is important because the current "LZW compression kit" (as it is referred to by libtiff) is a drop-in replacement built for only for libtiff 3.5.5 through the 3.6 series only. If you are running 3.5.4 or earlier, you should upgrade. To check, simply launch your package management software (yum, Synaptic, or the equivalent), look for the package named libtiff or tiff, and note the version number. If you don't have a package-management application, you can always run
locate libtiff from the command line and look for the exact version number among the output.

Tomorrow's library today!

"Wait just a cotton-pickin' minute," you say. "Why don't I just upgrade to libtiff 3.7?" Of course you can. But if you use a package-management system, you will create dependency problems for applications that assert themselves as dependent on libtiff 3.6. That's poor dependency-checking on the part of those applications, of course, but regrettably it is commonplace. You could, in all liklihood, trick applications into using 3.7 with no ill effect. However, then you also run into a lot of package-manager complaints along the way. Better to "repair" the 3.6 library that your distro believes is current, than to start using --force options with every package update and new install.

The lzw-compression-kit is available from libtiff.org. Download it and grab the source of libtiff 3.6.x that corresponds to your installation, either through your package manager or from libtiff's home page.

The next step is to unpack both the kit and the library source into a directory where you can work. The compression kit contains a README, a Changelog, and a file named tif_lzw.c. Copy this last file to the tiff-3.6.x/libtiff/ directory you unpacked, overwriting the tif_lzw.c file that came with libtiff.

Next, move to the tiff-3.6.x directory and run the configuration script. For normal operation, sh./configure will create a makefile that installs the libtiff software to/usr/local. This is probably not what you want to do -- we are attempting to replace the system's libtiff installation, not create a second one. You can determine where your current libtiff libraries are installed with locate libtiff. For most of us, they are installed in/usr. Therefore we must tell the configuration script so with sh./configure --prefix=/usr. Finally, compile with make install. Note that you will need root privileges to overwrite the previously installed version.

If you don't feel comfortable patching, compiling, and overwriting a system library like this, you can build it in/usr/local and test it. First run configure without the --prefix flag, compile, and then launch the GIMP using your new libtiff via LD_LIBRARY_PATH -- run $ LD_LIBRARY_PATH=/usr/local/lib; export LD_LIBRARY_PATH, then launch the GIMP and try to save a TIFF file. But remember that LD_LIBRARY_PATH abuse can be bad. When you're satisfied that the new library functions properly, you can install it in the proper location.

You can now pat yourself on the back and start slimming down all of those bloated, uncompressed TIFFs that are slowing down your hard drive.

Lessons learned from libtiff 3.5 to 3.7

I hope the next generation of the Linux distros listed earlier will ship with libtiff 3.7, making the preceding patching unnecessary. That said, users of enterprise Linux distros and others with slow release cycles will likely be facing the problem for quite some time.

Meanwhile, let's thank the libtiff developers for making the fix so straightforward. There are other patent-encumbered libraries that are far more difficult to repair. Like Aesop's Fables, even the simplest of HOWTOs at NewsForge include a valuable moral lesson free of charge. In this case, the moral is that good clean coding standards can help you out when you least expect it.

Alas, software patent problems are going to be a problem for some time. For example, Freetype is currently waiting out a patent-blackmail situation from Apple regarding the TrueType bytecode hinter. And Mono fans take note and hope that Ximian is this nice to you when its Novell parent runs into patent difficulties.

Click Here!