January 3, 2008

The tricky task of supporting Photo CDs on Linux

Author: Nathan Willis

In the photography world, a prominent proprietary file format is Kodak's Photo CD (.PCD). Once the premiere format for film scanning, it is now a difficult-to-work-around relic. Recently I set out to resurrect some old PCD images on a Linux system -- a challenge that serves as an object lesson in the importance of open standards in any kind of digital archive.

For years PCD was the preferred target format for professional photo labs scanning slides and negatives. It was the output of high-end, all-in-one scanning systems built by Kodak, supporting all types of film from APS size up to large-format.

Although Kodak no longer sells the equipment that labs used to create PCD files, a great deal of information about the format is still available on the company's Web site. The format itself is unusual, which is part of why it is rarely supported by third-party software.

Each PCD file is multi-resolution, with a Base size, Base/4 and Base/16 reductions, and Base*4, Base*16, and (optionally) Base*64 high-resolution copies all embedded into a single file. In spite of the names, Base*4, Base*16, and Base*64 sizes are not algorithmic enlargements of Base; the highest-resolution image (whether it is Base*16 or Base*64) is the native scan. The top end gives the equivalent of 6.3 megapixels for Base*16 and 25 megapixels for Base*64.

And there is another wrinkle. The varying sizes are not stored separately like the thumbnails inside JPEG files. Instead, each of the lower resolutions is a subset of the pixels contained in the native-size image.

PCD also uses its own encoding method, PhotoYCC -- a system similar to YCbCr but that can represent luminance data above 100% brightness. In that sense, you might think of it as a precursor to the high-dynamic-range formats in wide use today. PhotoYCC defines 100% brightness in terms of a reference device, independent of the scene photographed.

An unofficial Windows solution

The extra latitude that encoding method provides is helpful for an image with extremely bright (e.g., glowing) objects. But the peculiarity of the design choice fooled a lot of applications and libraries, which incorrectly clipped the PhotoYCC data and produced TIFF and JPEG conversions with blown-out highlights.

Ted Felix was unhappy enough with the contemporary conversion offerings that he waded through Kodak's white papers and technical documentation and figured out a way to correct the poor conversion. He modified the Windows .DLL that Kodak supplied to software companies for loading PCD images, directly altering the look-up table that scales the luminance values.

Felix's patched library will work as a drop-in replacement for the original, turning a variety of software alternatives from highlight-destroyingly-useless into dependable conversion tools.

His site includes a list of the supported apps, all but one of which are commercial, and none of which are open source. The list does not include Adobe Photoshop, which in Felix's tests produced blown-out highlights like most of the competition, but which uses a different conversion routine and is therefore not fixable with his patched DLL.

Linux-friendly options

One supported Windows app is IrfanView, a lightweight and free (but closed source) image viewer praised by many photographers for its accuracy. IrfanView does run under Wine, so if you have no other way to access a PCD image on a Linux system, it is an option.

A native solution is preferable, though, and that's where it gets difficult. Years ago there was hpcdtoppm, a command-line utility for converting PCD images to Portable Pixmap (PPM) format. It was included in the Netpbm package, but several distros (including, notably, Debian, Ubuntu, and SUSE) removed it from their versions of Netpbm because of its restrictive licensing.

If your distro does include hpcdtoppm, you can check whether the included version suffers from the blown-highlights problem with either your own image or with one of Felix's test PCDs. Felix links to a patch for hpcdtoppm, so if the converted output is bad, consider applying it. But beware, you might have to manually edit the code rather than apply the patch with patch if your version of hpcdtoppm differs considerably.

The more reliable solution is ImageMagick (IM), which is actively maintained and a standard component in almost all Linux distros. IM can convert a PCD file with convert image001.pcd image001.tiff.

I found IM's conversions subjectively too bright in the highlights -- although not as bad as some of the other offenders documented by Felix. The conversion is done in magick/colorspace.c using the YCCMAP table about halfway into the file.

If you plot the YCCMAP table as a function, you can see where the highlight compression occurs. Essentially, the compressed look-up table performs a gamma-correction when converting from PhotoYCC to RGB, and it uses the same gamma correction for every image. Felix's replacement look-up table is linear, so none of the highlight information is compressed.

I discussed the subject with the IM developers, who say they tested their look-up table against Kodak's reference image and got expected results.

Obviously conversion factors like curves are the type of thing photographers can argue the merits of all day long, but when converting from an uncorrected scan into a work-mode image, I prefer not to lose any information. Luckily with open source you can make the changes yourself.

If you patch colorspace.c to use a linear look-up table, you can apply any necessary gamma-correction in the image editor, and make it specific to the image contents and usage. I created a linear YCCMAP look-up table, which you can apply to colorspace.c using this patch. It is diff'ed against IM's trunk, which you can check out from anonymous SVN. I am happy with the results it provides -- to be certain, your mileage may vary, but at least Photo CDs are a stable optical media, so you don't have to worry too much about ruining your only copy of the file.

What to learn

I became interested in PCD support because of one of the scenarios described in the sidebar: I wanted to use an image I liked, but for which the original was lost and the PCD was my best remaining scan. But the process got me thinking about digital archives in general, and the critical importance of the archive format.

PCD was not flawed technically, it just was a single-vendor format, and when Kodak lost interest in maintaining it, it was stone cold dead. Even while it was alive, though, many software makers shipped a faulty implementation of the conversion utility for it, in spite of the public documentation (Ted Felix slogged through Kodak's own support docs, remember). And once it was no longer in common usage, the knowledge of how to properly interpret it was soon lost, and bugs started creeping into even the open source projects like ImageMagick.

There is a moral to heed in that story. Today, the major digital camera manufacturers each use a single-vendor RAW format, in most cases poorly documented. How easy will it be to access those .NEF and .CR2 files 10 years from now? Even Adobe's camera-maker-neutral DNG format, whose specification is publicly available, is written and released by Adobe alone. I'm reasonably certain that Adobe will still be around in 10 years, but I can't say the same for DNG. I just hope ImageMagick is still here, too.


  • Graphics & Multimedia