24
21
Do you know a good software to download all PDF links in a web page??
Operating system is Windows 7.
24
21
Do you know a good software to download all PDF links in a web page??
Operating system is Windows 7.
37
You can use wget and run a command like this:
wget --recursive --level=1 --no-directories --no-host-directories --accept pdf http://example.com
Or with the short options:
wget -r -l 1 -nd -nH -A pdf http://example.com
UPDATE: Since your update says you are running Windows 7: use wget for Windows from a cmd
prompt.
UPDATE 2: For a graphical solution - though it may be overkill since it gets other files too is DownThemAll
6
In your browser, press CTRL+SHIFT+J, and enter
var pdflinks =[]; Array.prototype.map. call(document.querySelectorAll("a[href$=\".pdf\"]"), function(e, i){if((pdflinks||[]).indexOf(e.href)==-1){ pdflinks.push( e.href);} }); console.log(pdflinks.join(" "));
This will return in the console:
"https://superuser.com/questions/tagged/somepdf1.pdf" "https://superuser.com/questions/tagged/somepdf2.pdf" "https://superuser.com/questions/tagged/somepdf3.pdf"
Now using wget
with the command line options wget url1 url2 ...
Copy and paste this, open a console enter wget
press the right mouse button to insert your clipboard content and press enter.
To use a download file, join the lines with "\n" and use the parameter as follows wget -i mydownload.txt
Note that most other (GUI) download programs too accept to be called with a space separated list of URLs.
Hope this helps. This is how I generally do it. It is faster and more flexible than any extension with a graphical UI, I have to learn and remain familiar with.
1Better yet, console.log('"' + pdflinks.join('" "') + '"')
-- otherwise you don't actually get quoted URLs – dan3 – 2015-01-21T18:29:53.857
1
If you want to stay in the browser, I've written a web extension for exactly this purpose - I'm working on adding the ability to save scholarly article PDFs with properly formatted titles but if you just want to download 'em all it's perfect for this.
It's called Tab Save and on the Chrome web store here. You don't even have to input the list of URLs if you just open them all in tabs (but for large numbers of files this might slow a computer down so I added the option to add your own).
0
I recently used uGet (on Windows) for this. It has a GUI, and you can filter the files you intend to download.
Saves trying to remember all those
0
On Google Chrome, it's possible to use extensions such as:
With this extension you can download all images, videos, pdf, doc and any other file linked on the web page you are visiting.
0
There are few Python tools which allows downloading PDF links from the website based the Google search results.
E.g.
google_dl
script (recommended).
Usage:
./google_dl -s http://www.example.com/ -f pdf ""
gsrchDwn
script (based on neo's script).
Usage:
./gsrchDwn.py --query "site:http://www.example.com/" --ftype pdf
Note: I'm the maintainer of both mentioned scripts.
Both of them are implementing xgoogle
Python library. My fork of this library is based on the pkrumins/xgoogle
version.
Related: A web search from the Linux command line.
2This rejects even the initial .html page. Has it ever been tested? – dan3 – 2015-01-21T18:28:15.317
The question asks about downloading all PDF links, so yes, the initial .html page will be ignored. – Kevin Worthington – 2015-01-21T21:48:36.373
Is there a posibility to do the same thing in Windows 7 using Power Shell? – Benedikt Buchert – 2015-07-04T11:28:36.700
1I would also suggest throwing in a delay of at least a few seconds between file downloads so as to be nice and not overwhelm the remote server. e,g, for wget, add in a flag of
-w 5
– KJH – 2016-01-21T15:21:04.423thank you kevin for your advice, wget looks good, anyway i would prefer a 'graphic' software, non command line. :) – iAsk – 2011-03-20T21:09:37.017