Linux.com

Feature

Putting together PDF files

By Scott Nesbitt on June 17, 2004 (8:00:00 AM)

Share    Print    Comments   

There are times when you need to combine multiple files from diverse sources into a single PDF file. In Windows or the MacOS it's easy -- use Adobe Acrobat. Sadly, Adobe hasn't deigned to put out a version of Acrobat for Linux, but there are a number of Linux utilities available that enable you to quickly and efficiently combine PDF files. This article looks at three command line utilities: Ghostscript, joinPDF, and pdfmeld. Each does a good job of combining PDF files, and they all pack some interesting features.

Joining PDFs the Ghostscript way

Ghostscript is a package that enables you to view or print PostScript and PDF files to other formats, or to convert those files to other formats. It's a popular tool among Linux users, but what many people don't know is that Ghostscript is also a powerful tool for combining PDF files.

To use Ghostscript to combine PDF files, type something like the following:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf

Unless you're very familiar with Ghostscript, that string of commands won't mean much to you. Here's a quick breakdown:

  • gs -- starts the Ghostscript program
  • -dBATCH -- once Ghostscript processes the PDF files, it should exit. If you don't include this option, Ghostscript will just keep running
  • -dNOPAUSE -- forces Ghostscript to process each page without pausing for user interaction
  • -q -- stops Ghostscript from displaying messages while it works
  • -sDEVICE=pdfwrite -- tells Ghostscript to use its built-in PDF writer to process the files
  • -sOutputFile=finished.pdf -- tells Ghostscript to save the combined PDF file with the name that you specified

When using Ghostscript to combine PDF files, you can add any PDF-related option to the command line. For example, you can compress the file, target it to an eBook reader, or encrypt it. See the Ghostscript documentation for more information.

The biggest advantage to Ghostscript is that it's a standard part of many Linux distributions. If you don't have it on your computer, it's easy to download and install it.

Using Ghostscript has its drawbacks, too. Unless you use Ghostscript's PDF options, the utility produces a barebones merged PDF file, and a large one at that, because by default Ghostscript doesn't compress PDF files. On top of that, some people may find typing long strings of options at the command line to be a bit of a chore.

joinPDF: Quick and simple

If you want a no-muss, no-fuss way of joining two or more PDF files together, look no further than joinPDF. It's a simple but elegant little utility that consists of a script (named joinPDF) and a compiled Java file. To run it, you only need to specify at the command line the name of the output file and the files that you want to combine. To use joinPDF you type something like this:

joinpdf myFile.pdf file1.pdf file2.pdf ...

Depending on how many PDF files you're combining and their sizes, joinPDF only takes a few seconds to merge them. JoinPDF compresses the output file it generates; while writing this article, I used with joinPDF to merge various combinations of files of various sizes, and each time, the resulting PDF file was several kilobytes to several tens of kilobytes smaller than the total sizes of the source files.

JoinPDF is a Java utility -- to use it, you need version 1.4 of the Java Runtime Environment installed. It runs on any Linux distribution, or any other operating system that supports Java. In order to use joinPDF out of the box, you have to copy the Java file to the /usr/lib directory -- that's where the joinPDF script expects to find it. If you want to put the Java files somewhere else, like the /usr/local/bin directory, you need to edit the joinPDF script to point to that directory.

The biggest advantage of joinPDF is its simplicity. There are no options to remember. Of course, some users might find joinPDF's simplicity to be a detriment. If you want options, joinPDF isn't for you. Also, joinPDF cannot join PDFs if one or more of them is encrypted.

The joinPDF package comes with another script called splitPDF. As its name implies, splitPDF is used to extract pages PDF files. A discussion of splitPDF is beyond the scope of this article, but if you need to pull pages out of your PDF files, you'll find splitPDF useful.

Merging PDF files with pdfmeld

Do you need a lot of features in the software that you use to combine your PDF files? Then consider pdfmeld. Of the three applications discussed in this article, pdfmeld is probably the most powerful and flexible.

To use pdfmeld you type something like this at the command line:

pdfmeld file1.pdf,file2.pdf,... result.pdf [options]

pdfmeld has literally dozens of options -- for a full list, check out the documentation. These options include adding bookmarks to a PDF file, encrypting the PDF file, and adding information like title, author name, and subject. While it sounds complex and difficult to use, pdfmeld really isn't. You'll quickly find that you'll only use a handful of the options regularly, and you can forget about the rest.

pdfmeld doesn't just combine PDF files. You can use it extract pages from a PDF file, rearrange the pages in a file, rotate pages, and even touch up text. In fact, pdfmeld packs many of the features of Adobe Acrobat in a package that weighs in at just over 1 MB.

pdfmeld's range of options are its greatest strength. But they come at a price, albeit a small one -- $9.95. Like joinPDF, pdfmeld automatically compresses the resulting file. It's also very fast: it only took a few seconds to mash three 20-page PDF files together on my old 300MHz Linux box.

I found very little wrong with pdfmeld. One problem that I did encounter, that I didn't see with Ghostscript or joinPDF, was the error message "Page Contents Object has Wrong Type" when I tried to open a merged PDF file in Acrobat Reader. This happens when an empty page contains contents information. This only happened twice, when I added a cover followed by a blank page to a particular document.

Other tools

These three applications aren't your only choices. Some of the other tools available for merging PDF files include pfdtk, Multivalent, and pdcat. I briefly looked at pdftk and Multivalent (pdcat is a commercial product), and found them to be solid applications.

So, which utility comes out on top? Just for its sheer number of features, you should give pdfmeld a serious look. While some people might balk at dropping $9.95 for software that does pretty much the same thing that Ghostscript does, I think the price is well worth it. Of course, being a long-time Ghostscript user I still have a soft spot for it. But typing those long strings of options really wears me down after a while. And joinPDF is perfect if you want to get the job done quickly and easily.

If you're adamant about using only free software, then go with Ghostscript or joinPDF. But if you can afford to drop 10 bucks, you'll find that pdfmeld is a great little application that can handle all of your PDF merging needs and then some.

Scott Nesbitt is a Toronto, Canada-based writer and the Toronto managing editor for the ScalableAir Network.

Scott Nesbitt is a freelance journalist and technical writer based in Toronto, Canada.

Share    Print    Comments   

Comments

on Putting together PDF files

Note: Comments are owned by the poster. We are not responsible for their content.

long command line options

Posted by: Anonymous Coward on June 17, 2004 08:44 PM
Why not just put an alias in your shell settings<nobr> <wbr></nobr>... eg in<nobr> <wbr></nobr>.bashrc (in your home directory) for bash users.

So,

[.bashrc]
alias pdflink='gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=./finished.pdf'

will let you do

pdflink file1.pdf file2.pdf<nobr> <wbr></nobr>... to create a file called finished.pdf in the current directory. This doesn't seem difficult.

Disclaimer:: I haven't tried this yet!!

#

Re:long command line options

Posted by: Anonymous Coward on June 17, 2004 10:01 PM
Interesting<nobr> <wbr></nobr>... Might have to try that myself. Like the writer of this piece, I'm really lazy.

#

Re:long command line options

Posted by: gogol on June 18, 2004 12:05 AM

Is there a good utility to join ps file? I tried the
ghostscript by doing:

<TT>gs -dBATCH -dNOPAUSE -q -sDEVICE=pswrite -sOutputFile=out.ps in1.ps in2.ps</TT>


Though it joins the ps files, the fonts are all messed.
I had tex output of a book, split chapterwise. But after
joining, it hardly looks like the beautiful tex output.

#

Re:long command line options

Posted by: Anonymous Coward on June 18, 2004 08:06 PM
Tried a variation of this and it worked. Good advice!

#

Don't forget iText

Posted by: Anonymous Coward on June 17, 2004 08:57 PM
With iText (http://www.lowagie.com/iText/) you can do much more than just combine PDFs. It's a Java library so you can extend your own programs for PDF functionality, but it also includes utilities to join multiple pdfs, split pdfs, create handouts from pdfs, and encrypt pdfs.

#

Re:Don't forget iText

Posted by: Anonymous Coward on June 17, 2004 09:06 PM
Actually, joinPDF is based on iText. And it's a lot easier to use.

#

PDF files vs text

Posted by: Anonymous Coward on June 17, 2004 11:25 PM
If you don't have to use PDF DON'T.
second rule, see first.

They are a pain, slow, messy to read<nobr> <wbr></nobr>...

#

Re:PDF files vs text

Posted by: Jonathan Bartlett on June 18, 2004 01:18 PM
Actually, I've found that PDFs are great for the print world. They are also great for bridging the gap between print and online. You often don't have the money/time to move all of your documents to HTML, so PDF is a nice in-between step.

For print work, PDF is wonderful because it keeps your document exactly how it will be printed on the printer. They are also universally accepted, and have tons of tools that print shops can use with them. They are based on the PostScript model, so they are easy to convert back and forth from PostScript as well.

There are PS/PDF tools for creating signatures from input files (signatures are how pages are printed for book binding - quite a bit different than how they are arranged after generating them from your files), doing color separations, watermarking, and all sorts of things. It's actually a pretty complete and great system for print.

#

Kprinter as a GUI for Ghostscript

Posted by: Anonymous Coward on June 18, 2004 12:28 AM
I find that kprinter makes an excellent and simple to use utility for joining pdf files.

#

GSView as a GUI for joining PDFs?

Posted by: Anonymous Coward on June 18, 2004 12:49 AM
Is it possible to use GSView as a GUI for joining PDFs?

#

grep pdf documents?

Posted by: jamesonburt on June 19, 2004 01:08 AM
I use hundreds of pages of pdf technical documents,
from which I would often like to see every line
with a particular word.
For example, something like

      pdfgrep K455 prism_census_codebook.pdf
would return every use of the code "K455".

Since there is no pdfgrep program,
I laboriously use the acroread GUI
to find K455 in my pdf document.

Do you know of a "grep" for pdf files?

#

Re:grep pdf documents?

Posted by: Anonymous Coward on June 19, 2004 08:08 PM
maybe a

pdftotex PDFFILE.pdf

and then a

grep whatyousearch PDFFILE.txt

could do it. not very elegant, but it works. pdftotext comes with the xpdf-package

HTH
raf

#

Re:grep pdf documents?

Posted by: immytay on June 20, 2004 11:05 PM
I have not used this, but there is a python utility called pdfsearch that looks interesting. It says it uses xpdf, so it's somewhat related to the pdfttotext approach someone else mentioned.

http://pdfsearch.sourceforge.net

#

Re:grep pdf documents?

Posted by: jamesonburt on June 21, 2004 09:34 PM
pdftotext -layout 02a0214.pdf - |egrep hamburger --context=4 --color=always

Derived from another's comment, this works reasonably well, colorizing the matching text "hamburger".
This gives a context of 4 lines above and 4 lines below, needed since pdftotext cannot look exactly like pdf.

#

Re:grep pdf documents?

Posted by: Anonymous Coward on June 22, 2004 08:38 PM
That's a great idea. I just made a little shell script using your example as a template. The script can be used exactly like a normal grep.


Find the script here: <A HREF="http://blog.rompe.org/pdfgrep" title="rompe.org">pdfgrep</a rompe.org>

#

OpenOffice.org as a GUI option

Posted by: Anonymous Coward on June 19, 2004 10:05 PM
Was the article's focused on command line utilities deliberate? I haven't used Abode Acrobat except for the PDF reader, but last time I used <A HREF="http://openoffice.org/" title="openoffice.org">OpenOffice.org</a openoffice.org> (on Mandrake GNU/Linux) it had "export as PDF" in the file menu next to "print". Some might find this a good option, depending on the source file format and what sort of editing one wants to do besides the join.

#

deskPDF is another one..

Posted by: Anonymous Coward on June 21, 2004 08:58 AM
I've been using docudesk deskpdf for some time. They recently released a "pro" product which includes advanced options including prepending and appending. They incorporate the Ghostscript interpreter in a commercially supported product. Think it was less than $20.

Highly recommended and has great support. site is docudesk.com

#

Re:deskPDF is another one..

Posted by: Anonymous Coward on June 21, 2004 08:06 PM
Does it run on Linux?

#

Putting together PDF files

Posted by: Anonymous [ip: 216.46.143.210] on March 11, 2008 06:41 PM
My problem with all these solutions, is that they may allow you to look at multiple pdf files, but you won't be able to edit them. Some pdf's are forms that are made for people to fill in. In this group I've only tried ghostscript (so far) and it has the same problem. The pdf's are combined but they loose the entering of information abliity.

Anyone have an answer to this?

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya