Advanced XML-based typesetting and printing

Last time we introduced you to the simple tools you can use to do typesetting with the Document Style Semantics and Specification Language (DSSSL. In this concluding look at the topic, we’ll look at some more advanced techniques, and at how to get your typeset documents printed.

DSSSL is more than just a styling language like CSS. It is a full programming
language, which means you can have stylesheets that are as complex and
context-sensitive as you want. You can have if statements, procedures, and
loops in your stylesheet, and you can custom-process XML documents
yourself. DSSSL is based on the Scheme programming language. We will not
delve into Scheme programming here, so if we lose you, just skip to the next
section until you’ve had a chance to read up on it.

In our example document we have sections that are only meant to be used for the leader’s manual, marked up with a leadernotes tag. We may want to produce a regular guide and a leader’s guide. In order to do that, we have to have OpenJade process only the leadernotes sections in certain circumstances. Therefore, we will want to define a variable at the beginning of our stylesheet that says whether this is the leader’s guide version. We can do that like this:

(define leaders-guide-version #t)

...

(element (leadernotes)
	(if leaders-guide-version
		(process-children)
		(empty-sosofo)
	)
)

In Scheme, the first expression after the if is the condition expression, the second expression is the true branch, and the third expression is the false brach. So, in this snippet, we have the leadernotes rule be conditional. If the variable leaders-guide-version is set to true, then it [what does “it” refer to?] will display the leadernotes by processing all of the text children of that element. Otherwise, it will return an empty element (elements are called sosofos by DSSSL people), which you use to represent nothing at all.

One problem with this leadernotes section is that there is nothing to tell the reader that this section contains leadernotes — it [what does “it” refer to?] will just display paragraphs as usual. To fix that, we should probably put the section in a bordered box titled “Notes for Leaders.” We can do that with the box flow object:

(element (leadernotes)
	(if leaders-guide-version
		(make box
			box-type: 'border
			line-thickness: 1pt
			(make paragraph
				font-size: heading-font-size
				font-family-name: "Helvetica"
				(make sequence
					(literal "Notes for Leaders")
				)
			)
			(process-children)
		)
		(empty-sosofo)
	)
)

In this version, the true branch is considerably longer, including generated content. We probably want to typeset homework in a similar fashion, so we should probably make the box-generating code into a function that takes the title of the box as a parameter. Let’s look at how we do this:

(define (make-titled-box title)
	(make box
		box-type: 'border
		line-thickness: 1pt
		(make paragraph
			font-size: heading-font-size
			font-family-name: "Helvetica"
			(make sequence
				(literal title)
			)
		)
		(process-children)
	)
)

(element (leadernotes)
	(if leaders-guide-version
		(make-titled-box "Notes for Leaders")
		(empty-sosofo)
	)
)

(element (homework)
	(make-titled-box "Homework")
)

As an exercise, change the #t to #f in the definition of
leaders-guide-version and rerun the process to get the standard guide.

There are many available flow objects with many parameters. The DSSSL standard
lists them all. OpenJade supports many but not all of them. DSSSL is
a little more restrictive environment than Scheme (it has no set! statement, among other things), but most Scheme knowledge is transferrable to DSSSL.

Cover design

Cover design is an essential part of the publishing process. Linux’s famous graphics editor, the GIMP, is what you’ll use here. You can also use one of Linux’s vector graphics editors, but you will need to use the GIMP for post-processing. Unfortunately, the GIMP was designed with the Web in mind, not the print world, so there are some issues you need to know about.

First of all, when designing for print, you need to think about dots per inch (dpi). To make good quality print, you will probably want to design at 300dpi. The maximum and minimum dpi will vary based on your printer. The high print density makes for rather large images. For example, a 13×9-inch image (i.e. a book cover for a 6×9-inch book with a 1-inch spine) at 300dpi is a 3,900×2,700-pixel image!

The GIMP can create such large images, but as you add layers of text and images, your computer will likely start to grind slower and slower. There’s not much you can do about this except buy a faster computer with a lot of memory, and then bump up the GIMP’s tile cache to make better use of it. The tile cache settings on GIMP 1.2 are under File->Preferences
and then Environment. Set your tile cache size to at least 80MB — you’ll need it. You might also remove some of your undos. They aren’t all necessary if you work with many layers.

The second thing the GIMP has trouble with is color for print. The difference between print and onscreen graphics is ink. Ink has a number of properties which cause it to mix very differently from light beams, which means a direct conversion from one colorspace to another
isn’t very easy.

Remember, inks subtract color, while monitor colors add them. If you add
equal amounts of red, green, and blue light, you get white. If you add
equal amounts of cyan, magenta, and yellow ink, you should get black.
I say should, because this is one of the issues with print — you don’t
actually get black when you mix the inks, you get brown. Inks don’t
mix perfectly as light does, so you have problems. To combat this,
printers usually use black ink as well, which is the K in CMYK, the color model for printers.

To convert from RGB to CMYK, you need to know how much black to add in in place of your other colors. The GIMP does not do this. While an experimental plug-in for doing color separation in GIMP is available, I cannot comment on its quality. Chapters 13-15 of the GIMP user’s manual go into depth about using GIMP for pre-press work.

You can use RGB images for pre-press, but you won’t have perfect control over the final output. Still, using RGB works better than most people think. In fact, many digital print-on-demand printers accept only RGB images. The method for preparing your image depends on your printer. CafePress, for instance, wants each cover and the spine to be separate images, each with a quarter-inch bleed margin
around the entire graphic. At LightningSource, they want a single TIFF file with the complete cover and a 1/8-inch margin around the entire cover. Local print shops usually do best with a PDF file. The best way to convert to PDF with
the GIMP is to print your document to a PostScript file and convert it to PDF. Do the
following in your print options dialog box:

Select File as the printer.
If you can get a PPD file for the printer you will be using, choose Setup next to the printer choice and select the PPD file to use.
Choose the size of the paper.
Under Scaling set the width and height in inches of your image, select PPI, and click on Set Image Scale.
Under Image Settings choose Solid Colors.
Click Print and choose what file you want to print to.

Let’s say we printed to a file called study_cover.ps. We can either leave that in PostScript, if that’s how our printer wants it, or we can convert to PDF, which is often what you need if using a digital press. To convert to PDF, use the following command:

ps2pdf14 -sPAPERSIZE=a3 -dAutoFilterColorImages=false -sColorImageFilter=FlateEncode study_cover.ps

This converts the PostScript file to PDF version 1.4, with a papersize of A3, and forces exact duplication of color images instead of JPEG-encoding them. We specified the paper size in this command line because the ps2pdf utility cannot autodetect paper sizes. To get a list of paper sizes available, look in the file gs_statd.ps that comes with Ghostscript. If your paper size doesn’t match one of the built-in options, you can use -dDEVICEHEIGHTPOINTS=XXX and -dDEVICEWIDTHPOINTS=XXX. A point is 1/12 of an inch. The list of options available to you with the -d and -s parameters is the same one available for Adobe Distiller; you can search the Web for the document distparm.pdf, which contains the list.

Prototyping with a local print shop

Now we have our book and our cover as PDF documents, so we can take them to
the printer to get test copies printed. When you do this, you need to know
several things:

What kind of paper you want the body and cover of your book printed on. The cover is the more important decision.
What kind of binding you want to use
Whether you want your image to bleed (go all the way to the edges of the paper — this is a little more expensive and requires that you have an extra 1/8 to 1/4 of an inch on all sides)
Any special instructions for the printer

The best way to get accurate results is to give the printer samples. You should bring with you:

A sample printout of your book done on your own printer
A sample full-color printout of your cover done on your own printer
A book that looks like you want yours to look like

That last one is really important, especially if you’re new to print. If
you bring a sample of a finished work that looks like what you want to create
then the printer will have a much better chance of producing what you want.
He can simply look at the book and see what paper types and bindings are
in use, and use them as a sample. The sample printouts are useful, too,
especially since we’re not using CMYK. This allows the printer to check his results
against what you have printed out.

In your prototype run, you probably want to use someplace that does short runs of about 10 cheaply, such as Kinko’s. You’ll probably get your output back in a few days, after which you can examine it as a whole product and make the changes you need before mass production. For mass production, follow the same process, but you’ll probably want to use a real print shop (Kinko’s is pretty expensive for large runs).

Prototyping using the Web

The Internet has a few businesses that do print-on-demand work. Print-on-demand is a technology that allows printers to produce individual copies of books almost as cheaply as doing them in a larger print run. Historically, printing even a run of 500 books would be prohibitively expensive even on a per-book basis, but print-on-demand makes printing even just one book affordable. CafePress.com and LuLu.com are two major Web sites that do print-on-demand (and they don’t do just books, either; they also do T-shirts and other similar items). CafePress works better with the Linux-based PDF tools, which is ironic since CafePress runs Windows while LuLu runs on Linux. I’ve had better success using CafePress to prototype and even to publish books.

To get your book printed through CafePress, you need to sign up as a member and open up
a store (don’t worry, it’s free). You then need to add a product — select the “book” type of
product. Then you upload your PDF and your covers. CafePress will generate cover templates for you so that you know how big to make your covers and where the bleed area is. While you can figure that out on your own for the front and back covers, the spine will vary depending on the number of pages. Make sure that your cover design is the same height and width and follows the margins of the templates. Finally, enter in some descriptive information about your book, and
set its price (it has to be at least the printing price; if you set the price above the printing
price and someone orders it, CafePress sends you a check for the difference!). When you sign up, CafePress provides you with a unique URL. You can go to that URL, add your book to your basket, put in your credit card info, and CafePress will print it and ship it in a matter of days. If you decide to make changes, it’s easy enough. Just remember that every time you change the content of the book’s PDF you have to change the spine image, since it is based on the number of pages in the book.

Off to the presses

To print your book, you need to decide if you want to do offset printing or digital printing.
Offset printing creates a huge upfront bill, but it costs less per book. I’m not going to
get into details, because there are so many factors. I’ve provided a bibliography for you
at the end that will help you if you choose that route.

With digital printing, you can literally print exactly as many books as are ordered. There are numerous print-on-demand printers available. If you just want to get started as quickly as possible, CafePress is the best one, as it will provide you with a Web site, a Web store, customer service, print the book for you, and handle returns at no charge. You don’t even need an ISBN. However, CafePress is too expensive to use for putting your book into retail channels and bookstores.

Some digital print-on-demand printers actually have links into the distribution chain.
LightningSource, for example, will make your book available in the Ingram catalog. Ingram is the U.S.’s largest book warehouser. Being in its catalog means your book is automatically listed in most U.S.-based online stores such as amazon.com, bamm.com, and bn.com, and it can be ordered by bookstores. Replica Books is a similar operation run by Baker & Taylor, the other warehousing giant in the book industry. There are many others, all with their own advantages and disadvantages. Unless you think you can convince a warehouse to stock your book, you should plan to use one of the printers tied to a warehousing or distribution agency.

I’ve covered the technical side of book production with Linux. However, there
are many other parts that are important too, like marketing (no book will sell if no one
knows it exists), ISBN numbers (required if you want online or brick and mortar retailers
to carry your book), Library of Congress numbers (if you plan to do library sales), pricing,
shipping, and other details. Before shelling out a load of cash mass printing your
work, you should have a marketing and fulfillment plan. The best reference for that are books
like Dan Poynter’s “The Self-Publishing Manual” and Tom and Marilyn
Ross’s “The Complete Guide to Self-Publishing.” There are also a lot of publishing email lists, the best one probably being Self-Publishing@yahoogroups.com (log into Yahoo, look in their groups, and look for Self-Publishing).

Be sure to start preparation early, as you will need to know your publication
date long in advance (this is different than the printing date), and will have
to shell out some change to get ISBNs and other nitty gritty details.

Now you can go off and write and publish the next best-seller or niche market
book!

Additional resources

Books:

Javier Farreres’ “The DSSSL Book“
Karin Kylander’s “GIMP: The Official Handbook“
Cary Bunk’s “Grokking the GIMP“
Kent Dybvig’s “The Scheme Programming Language“
George Springer’s “Scheme and the Art of Programming“

DSSSL links:

Paul Prescod’s “Introduction to DSSSL“
Daniel German’s “An Introduction to DSSSL“
“Advanced Topics in DSSSL” slideset
Markus Reinsch’s “Visual Introduction to DSSSL“
“The Best Guide to OpenJade and DSSSL“
“DSSSL Examples” (see notes.txt in the distribution at ftp://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/10_mail/10_mail.tar.gz)
Mulberry Technologies’ DSSSL pages
The DSSSL Standard

GIMP links:

Grokking the GIMP online version

stock.xchng free stock photography

Jonathan Bartlett is the director of technology for New Media Worx and is the owner of
Bartlett Publishing, a
Linux-based independent publisher. Jonathan’s latest book is
Programming
from the Ground Up, an introduction to programming using Linux
assembly language.

RELATED ARTICLESMORE FROM AUTHOR

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

Advancing Xen on RISC-V: key updates

RELATED ARTICLES MORE FROM AUTHOR