If you just want to concatenate two PDF files without any reprocessing of its content, pdftk
is for you. (On Mac OS X this should be available via MacPorts or Fink, for Linux, there are native packages for all major distributions; for Windows, look here.) Try this:
pdftk title.pdf content.pdf cat output book.pdf
This will prepend the title.pdf to the content.pdf and write the result into book.pdf.
pdftk
is a "dumb", but very fast way to concatenate two (or more) PDF files. "Dumb" in so far, as pdftk
does not in any way interpret the PDF data stream, it just makes sure that the internal object numbers are re-reshuffled as needed and appear in the PDF xref
structure (which basically is a sort of PDF ToC for objects).
Ghostscript:
If you want to use Ghostscript, the basic command to concatenate the same two files would be:
gs \
-o book.pdf \
-sDEVICE=pdfwrite \
title.pdf \
content.pdf
However, as you experienced, this simple command line may mess up your image quality. The reason is that Ghostscript is not 'dump' when it processes PDFs: it completely interpretes them when reading in, and creates a completely new file when writing out the result. For creating the result, it will automatically be using default settings for a lot of details in the overall processing. These defaults will apply for all cases where its invocations had not instructed Ghostscript otherwise.
So Ghostscript's method to create the new book.pdf is much more "intelligent" (but also much slower) than pdftk
's method. (This is also the reason why Ghostscript in many cases is able to --within limits-- "repair" b0rken PDF files, or to embed fonts into the output PDFs which are not embedded in input PDFs, or to remove duplicate images, replacing them by mere references, etc. -- and overall created smaller, better optimized files from bloated input PDFs...)
The solution is to not let Ghostscript use its defaults: by adding more custom parameters to the command line.
What does it mean "Ghostscript 'interprets' its PDF input"?
All of the file and its contents (objects, streams, fonts, images,...) are read in, checked and held in its own internal representation, before spitting out the resulting PDF with its PDF objects again. However, when 'spitting out', Ghostscript will apply all of its internal default settings for the hundreds of parameters [*] which there are available.
Unfortunately, this causes your "reprocessing" of images according to these default settings -- which can only be avoided or overridden by adding your own (desired) commandline parameters.
Your image problems could be caused by Ghostscript's need (due to licensing issues) to re-encode JPEG2000 images to JPEG encoding. If you want to avoid this, add the following to your commandline:
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
-dColorImageFilter=/FlateEncode \
-dGrayImageFilter=/FlateEncode \
Other image-related commandline options to consider for including are:
-dColorConversionStrategy=/LeaveColorUnchanged \
-dDownsampleMonoImages=false \
-dDownsampleGrayImages=false \
-dDownsampleColorImages=false \
So the complete Ghostscript commandline that could make you happy should read:
gs \
-o book.pdf \
-sDEVICE=pdfwrite \
-dColorConversionStrategy=/LeaveColorUnchanged \
-dDownsampleMonoImages=false \
-dDownsampleGrayImages=false \
-dDownsampleColorImages=false \
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
-dColorImageFilter=/FlateEncode \
-dGrayImageFilter=/FlateEncode \
title.pdf \
content.pdf
You could also tell Ghostscript NOT to compress images at all in the output PDF, by using this commandline:
gs \
-o book.pdf \
-sDEVICE=pdfwrite \
-dColorConversionStrategy=/LeaveColorUnchanged \
-dEncodeColorImages=false \
-dEncodeGrayImages=false \
-dEncodeMonoImages=false \
title.pdf \
content.pdf
.
[*]:
If you are interested to learn about a complete list of default settings which Ghostscript's pdfwrite device is using, run the following command. It returns you the full list:
gs \
-sDEVICE=pdfwrite \
-o /dev/null \
-c "currentpagedevice { exch ==only ( ) print == } forall"
For explanations about what exactly all these parameters do mean, you'll have to read up in the Adobe documentation about "Distiller Parameters". Ghostscript tries very hard to mimic all these...
Can you edit your question and quote the exact commandline you are using to prepend your title page to the original PDF? Then I could tell you what exactly to change or add to the commandline in order to get a better output for images... – Kurt Pfeifle – 2011-11-25T12:46:33.723
I don't want to just have it look better, I want to merge without reprocessing. This will a) result in better quality (lossless transforms), and b) not waste hours of CPU time processing my 1000+ page document. – Mahmoud Al-Qudsi – 2012-01-02T04:44:54.607
1Hey, you didn't answer my question and you didn't quote the exact GS commandline you are using. Which means: you'll not be getting the help regarding GS you're looking for... – Kurt Pfeifle – 2012-01-02T09:07:55.060