April 19, 2006

Moving to PDF as a future print job spooling format

Author: Kurt Pfeifle

Portable Document Format (PDF) is set to displace PostScript as the standard print job transfer and processing format for Linux, though Linux will maintain PostScript support for a long time to ensure backward compatibility.

This switch was agreed upon at last week's Linux Desktop Printing Summit. Open Source Development Labs (OSDL) and Linuxprinting.org organized the meeting, which was hosted by Lanier (a Ricoh corporation) at its Lanier Education Center in Atlanta.

At the meeting there was virtually no disagreement about the change. The fine details will have to be thrashed out over the coming months, but representatives from CUPS, Ghostscript, Linuxprinting.org, KDE, GNOME, hardware vendors (present were people from Epson, HP, IBM, Lanier, Lexmark, Ricoh, Sharp, and Xerox), and developers of free drivers all agreed that PDF will give them more power, more reliability, and more control over the printing process.

Why is this? After all, isn't PDF a (partially) binary format? Isn't there a disadvantage with PDF compared to PostScript, which you can open and modify with a standard text editor? As it turns out, PDF has much to recommend it.

The professional and commercial digital printing world has moved to PDF already. More and more printer models are able to consume and interpret PDF files directly. A number of international industry standards, such as PDF/X3, PDF/A, and others, make PDF processing more reliable.

We see a number of efforts underway to establish a standard way to embed "job tickets" into an actual print file. Job tickets carry the exact print settings the user asked for and are read -- and hopefully honored -- by all job processing programs, including the ones at the destination printer.

It makes a lot of sense to go into the same direction for Linux printing.

PDF, as a specification, is owned by Adobe, which developed it based on PostScript, which Adobe also owns. Like PostScript, PDF is an open and fully documented file format. PDF is not patent-encumbered; while Adobe owns a few PDF-related patents, it has guaranteed royalty-free usage of the PDF specifications to anybody, including the freedom to implement the standards for PDF generating and PDF consuming software to everyone who uses the specs.

PDF is better suited as a page description format than PostScript on various fronts. It supports transparencies and color profiles in an advanced way that PostScript can't compete with. PDF is now universally accepted as one of the best document exchange formats, even for non-printing purposes, and it is far better than PostScript for online publication and viewing. PostScript also has some known cumbersome aspects to it when it comes to printing, which are easier to solve with PDF.

Though many people are not aware of it, PostScript is a fully-fledged programming language, which means PostScript could be used with malicious code embedded in PostScript files, and it can cause incompatibilities, failures, and undesired output more often than printing professionals would like. PostScript files also are not easy to lay out as "n-up" or "book" signatures unless they meet the strict Document Structuring Conventions.

While PDF is derived from PostScript and its imaging model, its page description code cannot contain malicious pieces. In the original PDF file format specification the page description constructs no longer make up a fully-fledged programming language. The problematic PostScript instructions were removed from PDF 1.0, though some new "weird" capabilities were added by Adobe in later versions -- such as the ability to embed multimedia files or attach other document formats, which in turn necessitated the creation of industry standards such as PDF/X3 and PDF/A that prohibit usage of such extensions for digital printfiles and prepress purposes.

PDF lends itself more readily to imposition processing, file merging, page extraction, and other manipulations that may be appropriate for a print file format. Font problems which are present in PostScript are less frequent with PDF, and more easy to solve if they occur.

In addition, PDF files tend to be smaller than equivalent PostScript files thanks to the format's built-in compression. Searching for text strings, and copying and pasting them to other files, is almost impossible in PostScript, but easy in PDF. To use PostScript for Internet publishing in general is not a good idea, but PDF already has a major presence on the World Wide Web.

PDF, by default, enables direct random access to every page inside a file; PostScript files require linear processing of the first N pages before page N+1 can be displayed, and often it is impossible to go back to page N-1 once you see page N in a viewer.

Existing PDF support in Linux

While we're looking at a change of direction, it's not as if the summit's decision forces us to move into a completely new and unexplored field.

CUPS is already "PDF-ready" in a basic sense. Though there is not yet any job ticketing or overwhelming color profile support, CUPS does automatically process PDF files thrown at it. Its internal MIME-typing and filtering system auto-detects the format and converts it into any filetype the target printer may desire. A spooling system running CUPS being used as a pre-processor and PDF interpreter turns every inkjet, deskjet, impact, label, or laser printer into one that is ready for use in a PDF-only environment.

Ghostscript is PDF-ready in an even broader sense. It works in both directions -- convertinginto as well as converting from PDF. Since Ghostscript can convert PDF into PostScript it can act as stepping stone to further convert into other output formats, such as PCL3, PNM, JPEG, PNG, BMP, or display raster.

Even the venerable gv viewer is able to display PDF; it calls Ghostscript internally for the rendering into the on-screen format. It can also create PDF files with the help of its -sDEVICE=pdfwrite parameter. The pdfwrite device is rather sophisticated, albeit hard to handle on the command line, like all of Ghostscript. It accepts setdistillerparams commands, can produce PDF 1.4 output, is able to honor embedded pdfmark operators that may come within the to-be-processed PostScript input, can handle color profiles, can change various bitmap resolution settings, can embed fonts completely or subsets thereof, can downsample black and white or color images, and much more.

Easy-to-use wrapper scripts for Ghostscript such as ps2pdf convert PostScript files into PDF. Ghostscript can even convert multiple PS files into one single merged PDF document. Recent versions of AFPL Ghostscript have added basic support for PDF/X3 output and processing of embedded color profiles. Support for DeviceN color spaces has been a welcome improvement in the 8.5x series of Ghostscript.

And we have the luxury of another good, free PDF processing application: Xpdf. The CUPS pdftops filter is derived from it; the Poppler library project is forked from Xpdf and improved by a group of desktop environment developers at Freedesktop.org.

OpenOffice.org has a reliable PDF exporting function. KDEPrint has been able to print to PDF for five years, functioning as a virtual printer that writes any document as a PDF file to disk.

KWord can even import PDF files and edit them. Scribus has a world-class and standard-compliant PDF export capability. PDF files generated by Scribus pass even the most sophisticated professional prepress checks.

KPDF, a top-notch PDF viewer, replaces Adobe Acrobat Reader for most KDE users. KPDF may be missing a few features required by prepress professionals -- such as displaying layers in PDF-1.6, handling transparencies well, the ability to add notes, and JavaScript scripting support -- but Kpdf development is still in full swing.

More importantly, both KDE and GNOME are about to gain good PDF-writing back ends, which will be different from current virtual printers that "write to PDF" by converting PostScript input. The new back ends will work by writing to PDF directly, without a detour via a temporary PostScript file, with the help of their low-level libraries. This will enable all apps to export directly to PDF. GNOME will get it with its switch to the Cairo graphics libraries; KDE4 will be founded on Qt-4.2 or later, which contains an excellent PDF-writing back end via a component called Arthur.

What needs to be done

Of course, the FOSS software stack for handling PDF is far from complete, but summit participants hope that the pro-PDF message coming from the Desktop Linux Printing Summit will inspire more activity in this area by application developers. Soon we may see new Linux-based professional PDF utilities, such as pre-flight checkers, imposition software, and PDF file manipulating tools, either free or commercial. Yes, we already have the PDF Toolkit (pdftk), but some people think pdftk is too resource-intensive, and do not like it because it depends heavily on Java.

Overall, the move to PDF will guarantee that Linux will be taken more seriously in the printing industry. After all, representatives from seven printer vendors were already participating in the summit, and even more gave their apologies for not being able to make it.

There is still one thing that the open source community should be asking Adobe: When will it start to support its full product range on Linux? Adobe provides only Acrobat Reader 7 on Linux -- nothing else from its rich set of other applications. Adobe is in danger of losing out on several fronts: Microsoft is trying to crash PDF support in the market by pushing to get its own XML Paper Specification (XPS) format, formerly known as "Metro," accepted, while the free software community is about to bypass Adobe's unattractive Reader with its own software. Due to Adobe's long delay with Acrobat Reader, the free software community has come up with KPDF, oKular, and Evince.

Click Here!