How to mirror web page including content generated with JavaScript?

6

1

As an example, I tried saving this web page using Firefox's "Save Page As > Web Page, complete" but it did not save the comments section (see the saved files). I assume this is because the comments are generated dynamically via JS.

Is there an browser extension, or something else already implemented (e.g., a web service), that could do something like pull the entire DOM out in its current state and save it?

Printing the page as PDF does save the content seen on the browser but the layout is all messed up, plus I really want to save the page's source and media (HTML, JS, CSS, gif, pngs,etc).

amh

Posted 2011-09-17T20:04:38.140

Reputation: 307

Answers

1

The comments actually are really saved (open your HTML file inside archive in a text editor). They are just not getting displayed, because JavaScript on that page hides them on page load.

To see them, you can kill JavaScript temporarily before loading page, for instance using WebDeveloper add-on. After installing it, you will have a new toolbar added, and then choose first tab "Disable" -> "Disable JavaScript".

However when you load original page without JavaScript, the comments are not displayed. So this would mean that Firefox handles that kind of situations gracely. I never thought about it before because I rarely save pages to disk.

jakub.g

Posted 2011-09-17T20:04:38.140

Reputation: 4 332

-1

From the client side, if you're looking to save the state as it's being displayed, your best (and probably only) bet is to take a screen shot. Especially with dynamic, auto-generated contents, such as what might come from Javascript of Flash, there are no real practical alternatives (though, depending on the page you're looking at, you might sometimes get close with some tools).

Scanning and saving the DOM state is what "Save As" is already doing. Since there's no requirement for the data to have any intelligible or saveable state included anywhere in the DOM, you'll miss details no matter what you try if you're depending on the DOM alone. You'd have to do funky things, like pause and save the state of the Javascript VM in your browser, and then restore it somehow. Never mind the complication of state which updates regularly based on information and data stored elsewhere on the internet.

I'm not entirely sure why you're looking to also have the pages' client-side version media, but that you're already getting with the "Save Page As" command.

blueberryfields

Posted 2011-09-17T20:04:38.140

Reputation: 784