4
1
I got a book which had a pass to access digital versions of hi-res scans of much of the artwork in the book. Amazing! Unfortunately the presentation of all the these are 177 pages of 8 images each with links to zip files of jpgs. It is extremely tedious to browse, and I would love to be able to get all the files at once rather than sitting and clicking through each one separately.
archive_bookname/index.1.htm - archive_bookname/index.177.htm each of those pages have 8 links each to the files
linking to files such as <snip>/downloads/_Q6Q9265.jpg.zip
, <snip>/downloads/_Q6Q7069.jpg.zip
, <snip>/downloads/_Q6Q5354.jpg.zip
. that don't quite go in order. I cannot get a directory listing of the parent /downloads/
folder.
Also, the file is behind a login-wall, so doing a non-browser tool, might be difficult without knowing how to recreate the session info.
I've looked into wget a little but I'm pretty confused and have no idea if it will help me with this. Any advice on how to tackle this? Can wget do this for me automatically?
I definitely have legal access to all the files on it, I know that much. I contacted them to mention that I wished there was an easier way to access the files and never got a response – Damon – 2012-04-30T15:17:45.920
why code-format that?! – Chris2048 – 2012-05-01T21:52:03.510
Cause I'm much more accustomed to python than wget. Was waiting for someone to post a wget solution. :-) – Bibhas – 2012-05-02T04:08:57.897
@Bibhas sorry, I didn't mean there is anything wrong with your answer, just why did you put "I'm assuming scraping the website is legal" in code formatting? – Chris2048 – 2012-05-02T08:21:14.683
@Chris2048 Oh! That's not code tag. That's blockquote. I wanted to highlight that line. Thats why. – Bibhas – 2012-05-02T19:50:35.120
I have to login to access the files. Will that affect this method? (yah ages later.. these solutions were all pretty confusing and I haven't bothered yet) – Damon – 2012-09-06T19:59:12.407
Then you have tough luck. If there is no absolute url for these files that you can auto-generate, then it's not possible. If the files are behind some authentication check, this wont work. – Bibhas – 2012-09-07T10:34:13.880