Why is entire website not downloaded?

1

0

I tried to make a copy of the site wiredhealthresources.net using the command:

wget -rpkl inf wiredhealthresources.net

But the command only downloaded 54 files! Most of the pages are missing, e.g. /topics-cardiology.html, despite being linked to from /index.html

What did I do wrong? Why is wget not downloading the whole site?

Zaz

Posted 2016-10-27T14:35:50.810

Reputation: 1 843

While I can't answer the question itself, I would suggest giving HTTrack a try, as I have had more success with that.

– Sam3000 – 2016-10-27T14:42:08.910

Answers

4

If you look at the page source you won't see any topics-cardiology.html link because the sidebar is being generated by JavaScript. You will need to use a JavaScript headless browser like CasperJS to make a complete mirror.

Nathan

Posted 2016-10-27T14:35:50.810

Reputation: 216

Ahh! Makes sense. I should have checked the source. Thank you! – Zaz – 2016-11-02T22:03:46.370

Do you know of a good CasperJS script to mirror a website? I'm struggling to find one. – Zaz – 2016-11-15T22:22:40.267

-1

I'm reasonably sure you can't use the inf option to modify depth, only to modify tries, or query. Have you tried using -m instead of -r and -l? It sounds like you want to mirror the page, and that's what -m is used for.

Warley

Posted 2016-10-27T14:35:50.810

Reputation: 124

Both using -l 99 and wget -pkm yield the same result: only 54 files downloaded. The man page says -m is equivalent to -r -N -l inf --no-remove-listing, which is where I got the -l inf from. – Zaz – 2016-10-27T16:44:39.257