0

I want to save/download pdfs from X website and then combined all those pdfs into one, so that it is easy for me to see all of them at once.

What I did,

  1. get pdfs from website

    wget -r -l1 -A.pdf --no-parent http://linktoX
    
  2. combine pdfs into one

    gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=Combined_`date +%F`.pdf -dBATCH file1.pdf file2.pdf file3.pdf
    

My question/problem is, I thought of automating whole this in one script, so that I don't have to do this everyday. Here new pdfs are added daily in X.

So, how can I do step 2 above, without giving full list of all the pdfs, i tried doing file*.pdf in step2; but it combined all pdfs in random order.

Next problem is, total number of file*.pdf is not same everyday, sometimes 5 pdfs sometimes 10...but nice thing is it is named in order file1.pdf file2.pdf ...

So, I need some help to complete above step 2, such that all pdfs are combined in order and I dont have to give name of each pdf explicitly

Thanks.

UPDATE: This solved the problem

pdftk `ls -rt kanti*.pdf` cat output Kanti.pdf

I did ls -rt as file1.pdf was downloaded first, and then file2.pdf and so on...just doing ls -t put file20.pdf in the start and file1.pdf in last...

ErikE
  • 4,676
  • 1
  • 19
  • 25
seg.server.fault
  • 1,817
  • 4
  • 16
  • 11

2 Answers2

4

Try pdftk and use sort like this:

pdftk `ls files*pdf | sort` cat joined.pdf
chmeee
  • 7,270
  • 3
  • 29
  • 43
  • Sorting outputs something like this kanti10.pdf kanti12.pdf ... kanti19.pdf kanti1.pdf kanti20.pdf kanti2.pdf kanti3.pdf ... kanti8.pdf kanti9.pdf which is not what I want. Is there anyway to sort by modified time, then it might solve the problem ?? – seg.server.fault Aug 09 '09 at 19:54
  • I want to combine file in this order file1.pdf...file9.pdf file10.pdf file11.pdf and so on... – seg.server.fault Aug 09 '09 at 19:55
  • 1
    The part between the backticks \`ls ... \` just needs to output in the order that you want. The -t option to ls will sort by modified time, -r will reverse it, so use \`ls -rt files*pdf \` – bmb Aug 09 '09 at 20:05
  • Thansk bmb, it worked, but I used ls -rt as file1.pdf should come first and it was the one that was modified/downloaded first. – seg.server.fault Aug 09 '09 at 20:14
1

Instead of file*.pdf, you can output the list of files that you want with another command by using backticks e.g. ls ... as in chmeee's answer. You should be able to use your original ghostscript command like this:

This will sort the files by modify date, oldest first:

gs [...] `ls -rt file*pdf`

This will sort them numerically, starting with the 5th character:

gs [...] `ls | sort --key=1.5 -g`
bmb
  • 443
  • 4
  • 12