Extract first page from multiple pdfs

18

9

Have got about 500 PDFs to go through and extract the first page of. They then need to go through some time consuming conversion process so was hoping to try and save some time by have a batch process to extract just the first page from the 500 pdfs and place it in a new pdf. Have had a poke around Acrobat but can find no real method of doing this for multiple files. Does anyone know any other programs or methods that this could be achieved? Free and open source are obviously more favourable :)

EDIT: Have actually had some success using GhostScript to extract just one page. Am now looking at how to batch that and take the list of files and use those.

Tim Alexander

Posted 2010-11-05T12:19:58.963

Reputation: 1 798

What do the other steps in the conversion process involve? – Ignacio Vazquez-Abrams – 2010-11-05T15:50:45.730

About your edit, see my edit. – frabjous – 2010-11-05T16:07:23.913

Answers

30

Using pdftk...

On mac and linux from the command-line.

for file in *.pdf ; do pdftk "$file" cat 1 output "${file%.pdf}-page1.pdf" ; done

On Windows, you could create a batch file. Open up Notepad, paste this inside:

for %%I in (*.pdf) do "pdftk.exe" "%%I" cat 1 output "%%~nI-page1.pdf"

You may need to replace "pdftk.exe" with the full path to pdftk, e.g., "C:\Program Files\pdftk\pdftk.exe or whatever it is. (I don't use Windows so I don't know.)

Save it with an extension ending in .bat, drop it in the folder with the PDFs and double click.

You can do the same thing with Ghostscript, yes.

Let's see. For Mac and Linux (all one line):

for file in *.pdf ; do gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="${file%.pdf}-page1.pdf" -dFirstPage=1 -dLastPage=1 "$file" ; done

I'm not exactly sure what the corresponding command would be for a Windows batch file. My best guess (--I don't have windows so I can't test--):

for %%I in (*.pdf) do "C:\Program Files\gs\gs9.00\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH -sDEVICE#pdfwrite -sOutPutFile#"%%~nI-page1.pdf" -dFirstPage#1 -dLastPage#1 "%%I"

Double check the path to your ghost script executable is right, and well, I haven't tested this since I don't use Windows.


EDIT: OK, I just realized you probably don't want 500 1-page PDFs, but a single PDF that combines them all. Just run the above, and that will leave you with 500 1-page PDFs. To combine them using pdftk... on mac and linux:

pdftk *-page1.pdf cat output combined.pdf

I think it's probably the same on Windows, except maybe needing the full path to pdftk, as above. You could just add that line after the line above in your batch file.

With Ghostscript... on mac and linux:

gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="combined.pdf" *-page1.pdf

And it's probably the same on Windows, except replacing "gs" at the beginning with the full path to gswin32c.exe, as above.

There may be a way of ghostscript to do both in one step, but I'm too lazy to figure it out right now.

If the order in which to combine them is important, then we'll need more information.

frabjous

Posted 2010-11-05T12:19:58.963

Reputation: 9 044

that is the command I was looking for. have spent the afternoon reading about if loops in linux! Your initial command is the correct one, i.e. I need 500 single page pdfs. Had managed to get all the first pages in to a single pdf but the conversion to excel then makes it unmanagable. My users have some very specific request and layout requirments which is infuriating but challenging. thanks for you help!! – Tim Alexander – 2010-11-05T16:35:12.037

2

Just had to do it today in Linux. It should work for Mac too. Execute the following command from your terminal.

lpr -o page-ranges="1-1" path/to/folder/*.pdf

lpr submits jobs to the printer.

Note the * character usage in the command. This would run the command for all your PDF files in the directory.

vivek_ganesan

Posted 2010-11-05T12:19:58.963

Reputation: 121

As you point out, this will submit jobs to the printer. That's not what OP is asking for. – Nick K9 – 2017-02-22T16:30:53.393

1

I think you could use a pdf virtual printer, like pdf-forge.

You just "print" the first page, I on a mac now and cant try it but I´m quite sure you can do it more that one at a time.

Good luck!!

Trufa

Trufa

Posted 2010-11-05T12:19:58.963

Reputation: 207

thanks for the pointers on those. these have led me to GhostScript which looks like it might be able to do what I want. Thanks – Tim Alexander – 2010-11-05T15:07:56.873

@Tim Alexander, no problem at all!! – Trufa – 2010-11-05T18:17:09.173

0

On Linux

I wrote this command line

tree -fai . | grep -P ".pdf$" | xargs -L1 -I {} pdftk {} cat 1 output {}.firstpage.pdf

But it does the job, I tested it, it also works with as many levels of folders you have. Just make sure that you run it a the root of the folder structure. Every folder will have for every pdf file an aditional pdf ending with .firstpage.pdf

You need pdftk and tree for this and on Ubuntu Linux you can install it with apt:

sudo apt install pdftk tree

Eduard Florinescu

Posted 2010-11-05T12:19:58.963

Reputation: 2 116

0

Or use cpdf https://www.coherentpdf.com/ocaml-libraries.html:

cpdf -merge in1.pdf [<range>] in2.pdf [<range>] [<more names/ranges>]
     [-retain-numbering] [-remove-duplicate-fonts] -o out.pdf

cpdf -merge a.pdf 1 b.pdf 1 -o out.pdf

Jerry T

Posted 2010-11-05T12:19:58.963

Reputation: 101

0

as for the windows batch file command (.bat) (%% is for variables in a bat file)

first page extraction of pdf as jpg with reduced resolution / size

for %%I in (*.pdf) do "C:\Program Files (x86)\gs\gs9.14\bin\gswin32c.exe" -dSAFER -dNOPAUSE -dBATCH -sDEVICE#jpeg -r20 -sOutputFile#"%%~nI.jpg" -dFirstPage#1 -dLastPage#1 "%%I"

(in the post above sOutputFile was written wrong .. and with the current path of the standard gs x86 install)

(also look at Using Ghostscript to convert multi-page PDF into single JPG? )

ebricca

Posted 2010-11-05T12:19:58.963

Reputation: 99