11
9
Pretend I wanted a simple page copy to be downloaded to my HD for permanent keeping. I'm not looking for a deep recursive get, just a single page, but also any resources loaded by that page to be also downloaded.
Example: https://www.tumblr.com/
Expect:
- The index.html
- Any loaded images
- Any loaded JS files
- Any loaded CSS files
- Any images loaded in the CSS file
- links for the page resources localized to work with the downloaded copies (no web dependency)
I'm interested to know if you can help me find the best wget syntax or other tool that will do this. The tools I have tried usually fail to get the images loaded by CSS, so the page never looks right when loaded locally. Thank you!
Tangent Solution
I found a way to do this using FireFox. The default save is broken and there is an addon that is called "Save Complete" which apparently can do a good job with this. However, you can't download it because it says it is not supported in current FireFox version. The reason is that it was rolled into this addon: "Mozilla Archive Format". Install that, then when you use File > "Save Page As.." there is a new option called "Web Page, complete" which is essentially the old addon, which fixes the stock implementation FireFox uses (which is terrible). This isn't a WGET solution but it does provide a workable solution.
EDIT: Another ridiculous issue for anyone who might be following this question in future, trying to do this. Do get the addon to work properly you need to Tools > Mozilla Archive Format and change the (terrible) default setting of "take a faithful snapshot of the page" to "preserve scripts and source using Save Complete", otherwise the addon will empty all your script files and replace them with the text "/* Script removed by snapshot save */".
file > save as on firefox or other browser will download all images, js and css files – user31113 – 2011-10-01T02:34:31.263
Do you actually want the files, or do you just want a correctly rendered version of the page? – None – 2011-10-01T02:36:32.070
I want the files, they would be required to correctly render the page anyway. If you didn't have them it would look different. File > Save As does not work in Firefox. If you do this, you don't get the css images. Try it at https://www.tumblr.com/login. Background image missing, bg image for input fields missing.
– None – 2011-10-01T02:43:41.463None of the wget solutions worked for me. My Tangent Solution is the best method to achieve this kind of site saving. However, I have seen it fail on very complicated pages like http://www.apple.com, presumably because a lot of the resource paths are dynamically generated by executing javascript, some not right away but during some kind of ajax execution.
– Lana Miller – 2011-12-16T11:13:13.323