Chop pages of a PDFs into multiple pages

16

9

I've got a bunch of PDF files that contain two "real" pages to a single PDF page; I'd like to chop these in half and put each half on a separate page. Essentially, I need something that does the exact opposite of pdfnup (or psnup). How can this feat be achieved?

Platform is Linux, open source preferred; as I've got a great pile of these to do something that can be scripted (as opposed to a GUI) would be nice, so I can just give it a list of them and have it chew away.

A pre-existing script isn't the only option, either; if there's sample code to manipulate PDFs in similar ways with a third-party library, I can probably hack it into doing what I want.

womble

Posted 2010-09-22T05:14:14.697

Reputation: 855

Question was closed 2016-02-11T02:59:05.657

Possible duplicate of How can I split a PDF's pages down the middle?

– Skippy le Grand Gourou – 2016-02-10T14:38:51.650

Answers

22

You can solve this with the help of Ghostscript. pdftk alone cannot do that (to the best of my knowledge). I'll give you the commandline steps to do this manually. It will be easy to script this as a procedure, also with different parameters for page sizes and page numbers. But you said that you can do that yourself ;-)

How to solve this with the help of Ghostscript...

...and for the fun of it, I've recently done it not with an input file featuring "double-up" pages, but one with "treble-ups". You can read the answer for this case here.

Your case is even simpler. You seem to have something similar to this:

+------------+------------+   ^
|            |            |   |
|      1     |      2     |   |
|            |            | 595 pt
|            |            |   |
|            |            |   |
|            |            |   |
+------------+------------+   v
             ^
            fold
             v
+------------+------------+   ^
|            |            |   |
|      3     |      4     |   |
|            |            | 595 pt
|            |            |   |
|            |            |   |
|            |            |   |
+------------+------------+   v
<---------- 842 pt -------->

You want to create 1 PDF with 4 pages, each of which has the size of 421 pt x 595 pt.

First Step

Let's first extract the left sections from each of the input pages:

gs \
    -o left-sections.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [0 0]>> setpagedevice" \
    -f double-page-input.pdf

What did these parameters do?

First, know that in PDF 1 inch == 72 points. Then the rest is:

  • -o ...............: Names output file. Implicitely also uses -dBATCH -dNOPAUSE -dSAFER.
  • -sDEVICE=pdfwrite : we want PDF as output format.
  • -g................: sets output media size in pixels. pdfwrite's default resolution is 720 dpi. Hence multiply by 10 to get a match for PageOffset.
  • -c "..............: asks Ghostscript to process the given PostScript code snippet just before the main input file (which needs to follow with -f).
  • <</PageOffset ....: sets shifting of page image on the medium. (Of course, for left pages the shift by [0 0] has no real effect.)
  • -f ...............: process this input file.

Which result did the last command achieve?

This one:

Output file: left-sections.pdf, page 1
+------------+  ^
|            |  |
|     1      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v

Output file: left-sections.pdf, page 2
+------------+  ^
|            |  |
|     3      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v
<-- 421 pt -->

Second Step

Next, the right sections:

gs \
    -o right-sections.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [-421 0]>> setpagedevice" \
    -f double-page-input.pdf

Note the negative offset since we are shifting the page to the left while keeping the viewing area stationary.

Result:

Output file: right-sections.pdf, page 1
+------------+  ^
|            |  |
|     2      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v

Output file: right-sections.pdf, page 2
+------------+  ^
|            |  |
|     4      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v
<-- 421 pt -->

Last Step

Now we combine the pages into one file. We could do that with ghostscript as well, but we'll use pdftk instead, because it's faster for this job:

pdftk \
  A=right-sections.pdf \
  B=left-sections.pdf \
  shuffle \
  output single-pages-output.pdf
  verbose

Done. Here is the desired result. 4 different pages, sized 421x595 pt.

Result:

+------------+ +------------+ +------------+ +------------+   ^
|            | |            | |            | |            |   |
|     1      | |     2      | |     3      | |     4      |   |
|            | |            | |            | |            |5595 pt
|            | |            | |            | |            |   |
|            | |            | |            | |            |   |
|            | |            | |            | |            |   |
+------------+ +------------+ +------------+ +------------+   v
<-- 421 pt --> <-- 421 pt --> <-- 421 pt --> <-- 421 pt -->

Kurt Pfeifle

Posted 2010-09-22T05:14:14.697

Reputation: 10 024

@Unknown: Thanks for the downvoting! Would you please care to write a comment indicating some reason for this? – Kurt Pfeifle – 2011-05-30T20:10:09.770

+1 for awesome use of ASCII art, and very clear instructions. Just cause i'm a CLI n00b, the \ s escape the lines so its easier to read, right? – Journeyman Geek – 2011-06-28T06:20:27.313

@mullhausen: thanks for correcting the typo (421 -> -421). ;-) – Kurt Pfeifle – 2012-09-03T23:28:29.603

7

There is a tool pdfposter which can be used to create PDFs with several pages for one input page (tiling or chopping the pages). It is similar to the tool poster, which does the same for PostScript files.

Philipp Wendler

Posted 2010-09-22T05:14:14.697

Reputation: 534

pdfposter doesn't handle printing overlapping content at the edges, for easier poster assembly. It's a Perl script, though, so it's fairly easy to add. – Matthias Urlichs – 2013-06-29T12:03:06.090

3

So, after a lot more searching (it seems that "PDF cut pages" is a far better search), I found a little script called unpnup which uses poster, PDF/PS conversion, and pdftk to do exactly what I need. It's a bit of a long way around, but it's far superior to the other methods I found (such as using imagemagick) because it doesn't rasterise the pages before spitting them out.

Just in case mobileread goes away for some reason, the core of the script (licenced under the GPLv2 or later by Harald Hackenberg <hackenberggmx.at>) is as follows:

pdftk "$1" burst
for file in pg*.pdf;
do
    pdftops -eps $file
    poster -v -pA4 -mA5 -c0% `basename $file .pdf`.eps > `basename $file .pdf`.tps
    epstopdf `basename $file .pdf`.tps
done
pdftk pg*.pdf cat output ../`basename $1 .pdf`_unpnuped.pdf

womble

Posted 2010-09-22T05:14:14.697

Reputation: 855

@frabjous If you are familiar with briss why not write an answer under this question featuring that? – 把友情留在无盐 – 2015-06-05T10:14:48.710

@soubunmei b/c briss is a gui app, and so that wouldn't answer the question – frabjous – 2015-06-06T18:06:59.200

1

Gotta love it when people answer their own questions. However, if you needed to do it with a GUI, especially if the pages sizes weren't even or you wanted to further crop each side, check out Briss: http://briss.sourceforge.net

– frabjous – 2010-09-22T20:10:10.747

You should be able to do what you want with PDFTK by itself, without all the conversions. – CarlF – 2010-09-22T20:51:36.700

@CarlF: I thought it'd be possible, but I can't see anything in the PDFTK man page to manipulate the contents of pages. Got any pointers for me? – womble – 2010-09-23T06:38:04.060

@frabjous: What's wrong with answering your own questions? – Kurt Pfeifle – 2010-09-24T12:07:34.707

1@womble: your conversions go via PS/EPS. This is bound to lead to losses in quality (embedded fonts, transparencies, etc.). My suggestion avoids the risky PDF => EPS => PDF route and goes the safer PDF => PDF => PDF way. – Kurt Pfeifle – 2010-09-24T12:10:24.087

@pipitas. Nothing. I wasn't being sarcastic, I really do love it. – frabjous – 2010-09-24T20:06:08.167

I've tried pdfsam, jpdf, the solution with perl CAM:PDF and also the one with GhostScript....however, no success. Afterwards, I tried Briss, and it did it pretty easily and in every was done in a few minutes...so, thumbs up! – Rostislav Stribrny – 2013-06-04T22:15:56.670

2

I found the answer by Kurt Pfeifle to be very helpful for my similar situation. I thought I might share my modification of the solution with others...

I too had a scanned PDF that had 2 pages on each sheet. It was an 11 x 8.5 (inch) scan of a saddle-stitched booklet that was left stapled when originally scanned, so: PDF page 1 = back and front cover; PDF page 2 = pages 2 and 3, etc. This reads fine onscreen but you can't print it and then staple it to make more copies of the booklet.

I needed to be able to print this on a duplex copier; i.e. turn it BACK into an "imposed" PDF, ready for printing. So using Kurt's solution, I made this (ahem) "one-liner" to convert it back into half-pages, in the correct page order again. It will work for any HEIGHT and WIDTH, and also for any number of pages. In my case, I had a 40-page booklet (20 scanned pages in the PDF.)

HEIGHT=8.5 WIDTH=11 ORIG_FILE_PATH="original.pdf" \
count=$(set -xe; \
gs -o left.pdf -sDEVICE=pdfwrite \
-g$(perl -e "print(($WIDTH / 2) * 720)")x$(perl -e "print($HEIGHT * 720)") \
-c "<</PageOffset [0  0]>> setpagedevice" \
-f "$ORIG_FILE_PATH" >/dev/null; \
gs -o right.pdf -sDEVICE=pdfwrite \
-g$(perl -e "print(($WIDTH / 2) * 720)")x$(perl -e "print($HEIGHT * 720)") \
-c "<</PageOffset [-$(perl -e "print(($WIDTH / 2) * 72)")  0]>> setpagedevice" \
-f "$ORIG_FILE_PATH" | grep Page | wc -l ); \
echo '>>>>>' Re-ordering $count pages...; \
(set -xe; pdftk A=right.pdf B=left.pdf cat \
A1 `set +xe; for x in $(seq 2 $count); do echo B$x A$x; done` B1 \
output ordered.pdf); \
echo "Done. See ordered.pdf"

You only need to alter the first few parameters in this command to specify the HEIGHT and WIDTH and ORIG_FILE_PATH. The remainder of the command calculates the various sizes and calls gs twice, then pdftk. It will even count the pages in your scan and then produce the correct sort specification (for the scenario I gave).

It outputs some progress about what it's doing, which will look like this:

+++ perl -e 'print((11 / 2) * 720)'
+++ perl -e 'print(8.5 * 720)'
++ gs -o left.pdf -sDEVICE=pdfwrite -g3960x6120 -c '<</PageOffset [0  0]>> setpagedevice' -f original.pdf
++ wc -l
++ grep Page
+++ perl -e 'print((11 / 2) * 720)'
+++ perl -e 'print(8.5 * 720)'
+++ perl -e 'print((11 / 2) * 72)'
++ gs -o right.pdf -sDEVICE=pdfwrite -g3960x6120 -c '<</PageOffset [-396  0]>> setpagedevice' -f original.pdf
>>>>> Re-ordering 20 pages...
++ set +xe
+ pdftk A=right.pdf B=left.pdf cat A1 B2 A2 B3 A3 B4 A4 B5 A5 B6 A6 B7 A7 B8 A8 B9 A9 B10 A10 B11 A11 B12 A12 B13 A13 B14 A14 B15 A15 B16 A16 B17 A17 B18 A18 B19 A19 B20 A20 B1 output ordered.pdf
Done. See ordered.pdf

Next, to get the page imposition you need for a printed booklet, you just "print" ordered.pdf on a custom page size of exactly the size you need (in my example, 5.5 x 8.5), sending it to a "booklet making" tool (in my case, I used Christoph Vogelbusch's Create Booklet for Mac from http://download.cnet.com/Create-Booklet/3000-2088_4-86349.html).

The resulting PDF will now be back to the original page size of 11 x 8.5 with 2 pages per sheet, but the ordering will be such that you can print it double-sided, short-edge binding, and voilà! you will have a printout you can photocopy and fold and saddle-stitch, reproducing the original booklet without ever disassembling (or even necessarily seeing) the original.

Hope this helps someone!

-c

Chris Thorman

Posted 2010-09-22T05:14:14.697

Reputation: 21

1

Based on piptas' answer above:

On windows, for splitting letter-size PDFs with a single cover image at start, the following worked great for me (note the use of [-612 0] in the second step, a positive value created blank pages because it pushed the wrong way.)

gswin32c -o left-sections.pdf -sDEVICE=pdfwrite -dFirstPage=2 -g6120x7920 -c "<</PageOffset [0 0]>> setpagedevice" -f input.pdf

Note the use of -dFirstPage=2 which instructs gs to begin processing on page 2.

gswin32c -o right-sections.pdf -sDEVICE=pdfwrite -dFirstPage=2 -g6120x7920 -c "<</PageOffset [-612 0]>> setpagedevice" -f input.pdf

This creates right-sections.pdf the same way. And now the cover image:

gswin32c -o cover.pdf -sDEVICE=pdfwrite -dLastPage=1 -g6120x7920 -c "<</PageOffset [0 0]>> setpagedevice" -f input.pdf

Next, since I didn't want to merge with pdftk using manual page input, I split the left and right sections into separate PDFs in a new directory.

mkdir input_file
copy cover.pdf input_file\0000.pdf
pdftk left-sections.pdf burst output input_file\%04d_A.pdf
pdftk right-sections.pdf burst output input_file\%04d_B.pdf

Then I join the PDFs in that directory, alphabetically (and luckily that means they're sorted in the right order!) and I also run the result through ghostscript again to fix "Warning: Generation number out of 0..65535 range, assuming 0." errors produced by pdftk which ghostscript called "itext-paulo-155 (itextpdf.sf.net-lawagie.com)" -- it also happened to cut file size in half in my usage. With a 4.5MB original, pdftk's result was 6.7MB and gswin32c's reprocessing reduced that to 3.2 MB.

pdftk input_file\*.pdf cat output input_temp.pdf
gswin32c -o final_output.pdf -sDEVICE=pdfwrite -f input_temp.pdf

And we're done! Feel free to delete the input_file folder, cover.pdf, input_temp.pdf, right_sections.pdf and left_sections.pdf. ;-)

Louis

Posted 2010-09-22T05:14:14.697

Reputation: 147

1

if you just need to output the left-hand-side pdfs all in one document, and the right-hand-side pdfs all in one document, then the following script based on Kurt Pfeifle's answer will do the trick (works for any height and width):

$ cat split.sh
#!/bin/bash                                                                     

dims=$(pdfinfo "$1" | grep -i "page size:" | cut -d ":" -f2)                    
width=$(echo "$dims" | cut -d " " -f7)                                          
height=$(echo "$dims" | cut -d " " -f9)                                         
half_width=$(echo "$width * 0.5" | bc -l | cut -d "." -f1)                      
half_widthtt=$(echo "$width * 5" | bc -l | cut -d "." -f1)                      
heighttt=$(echo "$height * 10" | bc -l | cut -d "." -f1)                        

echo "pdf $1 has height $height and width $width"                               

gs -o "left-$1" -sDEVICE=pdfwrite -g"$half_widthtt"x"$heighttt" -c "<</PageOffset [0 0]>> setpagedevice" -f "$1"
gs -o "right-$1" -sDEVICE=pdfwrite -g"$half_widthtt"x"$heighttt" -c "<</PageOffset [-$half_width 0]>> setpagedevice" -f "$1"

then run it like so:

$ ./split.sh thepdftosplit.pdf

mulllhausen

Posted 2010-09-22T05:14:14.697

Reputation: 460