November 29, 2007

GNU PDF to fill missing gap in functionality

Author: Bruce Byfield

For many average users, GNU/Linux support for PDF files may seem reasonably advanced. They can create PDF files in programs like, read them with programs like Kpdf, and edit them in programs like pdftk or PDFedit. But that's not the whole story, says José Marchesi, founder of the recently created GNU PDF project. "Unfortunately, there are a lot of missing features in the existing free implementations," he says. That's the main reason why the Free Software Foundation (FSF) has declared GNU PDF a high priority project, and is actively seeking donations to speed its progress.

Marchesi is a long-time support of the GNU Project, the umbrella organization for free software projects connected to the FSF. In 1999, he founded GNU Spain, and he later assisted in the creation of GNU Italy and GNU Mexico. He has also contributed to GNU Ghostscript, GNU gv, and GNU Ferret, the first two of which provide support for both PDF and the closely related PostScript format. In addition, Marchesi performs what he calls "random works" in the GNU Project, such as writing internal code and editing Web pages as needed.

Marchesi says he first became aware of the need for better free PDF support a few years ago in his role as maintainer of gv. In December 2005, Marchesi tried to update the Ghostscript PDF interpreter that gv uses, only to find it was technically impractical. The solution, he decided, was to attack the problem at a more basic level, and, after he discussed the problem with members of the FSF and GNU Project, GNU PDF was born.

The reasons for a new PDF project

According to Marchesi, full support for PDF is urgent for a number of reasons, both technical and political.

On the technical level, once Marchesi started investigating, he discovered a great deal of PDF functionality that is either missing or incomplete: "interactive features (forms, annotations), the management of embedded contents (sounds and movies), execution of JavaScript to perform forms validation, 3-D artwork, accessibility, Web capturing, [and] management of document collections."

Many users are unaware of these lacks, either because they never use such features or because, Marchesi says, "The PDF standard is quite careful when providing backward compatibility: When a PDF consumer application (such as a viewer) finds an unknown construct (such as 3-D artwork), it can (and should) ignore it. But in fact you may be missing information."

The GNU Project would like to see a full implementation of the upcoming ISO 32000 standard for PDF. Despite the increasing frequency with which PDF is used for corporate and academic purposes, all software that provides the highest levels of support for the ISO standard is proprietary, which means that, without a concerted effort, free software users could be left behind.

Marchesi also says, "We want a GPLv3 implementation of PDF. Almost all of the existing alternatives are licensed under GPLv2 only." Besides the obvious credibility involved in having the new version of the license used, no doubt an important consideration is the conviction that a GPLv3 program will provide greater protection of users' freedoms.

The approach

Marchesi considered adding the missing functionality to existing free PDF libraries, the project quickly discovered that this idea was impractical, given GNU PDF's engineering goals.

"Our objective is to provide the same level of PDF support as Adobe [Acrobat]," Marchesi says, referring to the leading proprietary PDF program. "So we need a general and complete library that provides enough functionality to build an Acrobat-like program on top of it. This requires capabilities to both read and manipulate PDF files in an integrated library. None of the existing free implementations provides that [integration]. Some of them are designed to provide rasterization of PDF pages, such as Ghostscript, Xpdf, and Poppler, while others are designed to provide facilities for PDF manipulation, such as PoDoFo." Each is suitable for its particular purposes, but not for the integrated support envisioned by GNU PDF.

GNU PDF's first goal is to write a library in the C programming language "intended to be used by both PDF consumer and PDF product applications," Marchesi says. "The library will be similar to the Adobe PDF Library, providing access to several layers of abstraction. In this way, the library will be useful for many kinds of applications, not just viewers."

The next step will be to write an application that has already been labelled GNU Juggler, "an Acrobat-like application on top of the library." GNU Juggler, Marchesi says, "will be a specialized PDF viewer and editor." To help with the application's creation, a member of GNU PDF project is already performing a functional analysis of the latest edition of Acrobat Professional, Adobe's flagship PDF product, in order to reverse-engineer it.

One thing GNU PDF will not have to do is write a graphics library. Project members have already concluded that they can use libcairo. The members of the Cairo project are aware of GNU PDF, and some have already started discussing having the GNU PDF library being integrated with their work.

Realizing the project goals

The FSF has set up a Web page for donations to GNU PDF -- a first for any of its ongoing high-priority projects, although the FSF did briefly help collect pledges for the Free Ryzom campaign last year. However, Marchesi emphasizes that "we will go ahead with the project in any case." Donations would allow the project to hire full-time developers, instead of the volunteers more usual in a new free software project.

"To write the GNU PDF library and GNU Juggler is a really big task, and we want to do it really fast," Marchesi says. "It is crucial for us to have a free, complete, and high-quality implementation of the PDF standard as soon as possible."


  • Free Software
  • News
Click Here!