Download all PDF links in a web page?

Do you know a good software to download all PDF links in a web page??

Operating system is Windows 7.

iAsk

Posted 2011-03-20T20:20:42.313

Reputation: 689

Answers

You can use wget and run a command like this:

wget --recursive --level=1 --no-directories --no-host-directories --accept pdf http://example.com

Or with the short options:

wget -r -l 1 -nd -nH -A pdf http://example.com

UPDATE: Since your update says you are running Windows 7: use wget for Windows from a cmd prompt.

UPDATE 2: For a graphical solution - though it may be overkill since it gets other files too is DownThemAll

Kevin Worthington

Posted 2011-03-20T20:20:42.313

Reputation: 1 554

2This rejects even the initial .html page. Has it ever been tested? – dan3 – 2015-01-21T18:28:15.317

The question asks about downloading all PDF links, so yes, the initial .html page will be ignored. – Kevin Worthington – 2015-01-21T21:48:36.373

Is there a posibility to do the same thing in Windows 7 using Power Shell? – Benedikt Buchert – 2015-07-04T11:28:36.700

1I would also suggest throwing in a delay of at least a few seconds between file downloads so as to be nice and not overwhelm the remote server. e,g, for wget, add in a flag of -w 5 – KJH – 2016-01-21T15:21:04.423

thank you kevin for your advice, wget looks good, anyway i would prefer a 'graphic' software, non command line. :) – iAsk – 2011-03-20T21:09:37.017

In your browser, press CTRL+SHIFT+J, and enter

var pdflinks =[]; Array.prototype.map. call(document.querySelectorAll("a[href$=\".pdf\"]"), function(e, i){if((pdflinks||[]).indexOf(e.href)==-1){ pdflinks.push( e.href);} }); console.log(pdflinks.join(" "));

This will return in the console:

"https://superuser.com/questions/tagged/somepdf1.pdf" "https://superuser.com/questions/tagged/somepdf2.pdf" "https://superuser.com/questions/tagged/somepdf3.pdf"
Now using wget with the command line options wget url1 url2 ...

Copy and paste this, open a console enter wget press the right mouse button to insert your clipboard content and press enter.

To use a download file, join the lines with "\n" and use the parameter as follows wget -i mydownload.txt

Note that most other (GUI) download programs too accept to be called with a space separated list of URLs.

Hope this helps. This is how I generally do it. It is faster and more flexible than any extension with a graphical UI, I have to learn and remain familiar with.

Lorenz Lo Sauer

Posted 2011-03-20T20:20:42.313

Reputation: 758

1Better yet, console.log('"' + pdflinks.join('" "') + '"') -- otherwise you don't actually get quoted URLs – dan3 – 2015-01-21T18:29:53.857

If you want to stay in the browser, I've written a web extension for exactly this purpose - I'm working on adding the ability to save scholarly article PDFs with properly formatted titles but if you just want to download 'em all it's perfect for this.

It's called Tab Save and on the Chrome web store here. You don't even have to input the list of URLs if you just open them all in tabs (but for large numbers of files this might slow a computer down so I added the option to add your own).

Louis Maddox

Posted 2011-03-20T20:20:42.313

Reputation: 576

I recently used uGet (on Windows) for this. It has a GUI, and you can filter the files you intend to download.

Saves trying to remember all those

Cogitative

Posted 2011-03-20T20:20:42.313

Reputation: 1

On Google Chrome, it's possible to use extensions such as:

Download Master

With this extension you can download all images, videos, pdf, doc and any other file linked on the web page you are visiting.

kenorb

Posted 2011-03-20T20:20:42.313

Reputation: 16 795

Google

There are few Python tools which allows downloading PDF links from the website based the Google search results.

E.g.

google_dl script (recommended).

Usage:

./google_dl -s http://www.example.com/ -f pdf ""

gsrchDwn script (based on neo's script).

Usage:

./gsrchDwn.py --query "site:http://www.example.com/" --ftype pdf

^{Note: I'm the maintainer of both mentioned scripts.}

Both of them are implementing xgoogle Python library. My fork of this library is based on the pkrumins/xgoogle version.

kenorb

Posted 2011-03-20T20:20:42.313

Reputation: 16 795