How to download all links in a PDF

1

1

I have many PDFs and need to download all the hyperlinks within them. I have tried opening the PDF in Firefox and using Download Them All but I find it often does not get them all. So how can I accomplish this?

darmi

Posted 2015-05-14T03:11:53.827

Reputation: 33

Most hyperlinks I require are on a HTTP Secure website. I have the certificates and credentials to access it. Just need to find how to apply them with Wget.. – darmi – 2015-05-15T05:02:48.200

Answers

1

Interesting question! I'm partial to command-line utilities whenever available so in this case I'm using the following:

Both are portable (PDFtk is available only as an installer but you can copy pdftk.exe and libiconv2.dll elsewhere and uninstall it if you want). You can of course substitute Wget with cURL or whatever you like.

The following executed from the Windows command line will download all documents/pages linked to from a PDF:

for /f "tokens=2" %l in ('pdftk Test.pdf dump_data_annots ^| find "AnnotActionURI"') do wget "%l"

Use the following command for multiple PDFs:

for %f in (*.pdf) do for /f "tokens=2" %l in ('pdftk "%~f" dump_data_annots ^| find "AnnotActionURI"') do wget "%l"

Karan

Posted 2015-05-14T03:11:53.827

Reputation: 51 857

This looks promising but I am hitting this error "ERROR: cannot verify ...'s certificate, issued by ... Unable to locally verify the issuer's authority. Unable to establish SSL connection" – darmi – 2015-05-15T04:53:27.437

Try providing a certificate. If it doesn't work try --no-check-certificate. If that doesn't help either try to locate a more recent version of Wget for Windows (the GNU version is unfortunately quite outdated now), or else compile it yourself, or use an alternative like cURL instead as I mentioned above.

– Karan – 2015-05-15T18:14:46.453

Thanks for the help. Due to work/time constraints I am using pdftk blah.pdf dump_data_annots output test.html, opening it in firefox and using Download Them All with better success. Next I'll be looking to move the line above into a context menu if I can. – darmi – 2015-05-18T06:00:16.090