How do I completely download a web page, while preserving its functionality?

2

1

I've been trying to save this webpage using all of the methods that I know, but none of them have worked so far. The website itself has some great functionality. It is able to render MathJax in realtime, without any noticeable lag. I want to be able to use it offline, so I wanted to save it. I haven't been very successful. I'm on MacOS. Here is what I have tried so far:

  1. Save as on Safari as a Web Archive (.webarchive) – doesn't preserve the page's functionality
  2. Save as on Safari as Page Source (.html) – Completely messes the page up
  3. HTTrack – doesn't preserve the webpage's functionality
  4. Save as on Chrome as Webpage, Complete (.html) – messes up layout and functionality
  5. WebDumper – gives me a "Forbidden" error
  6. itsucks – messes webpage up
  7. SiteSucker – messes webpage up
  8. ScrapBook (firefox) – messes up
  9. A couple of other things that I can't remember anymore.

I just want to save the website and be able to use it offline. I noticed something interesting, however. When I'm in Safari and I go offline, the webpage performs fine. This undoubtedly means that the webpage can run offline with no problem – I just need a way to save it properly. I suppose I could create a virtual machine, load up the site on it and then save it as a snapshot and use it whenever I want to offline, but that seems like quite a disproportionate solution for such a seemingly simple problem.

On a side note: would it be possible to save a webpage like this (iPhone 6S page) with all of the scrolling animations, embedded pictures and videos and all the rest? I've only tried creating a Web Archive using Safari, but it only saved the nice scrolling animation – not the embedded pictures and such.

Skeleton Bow

Posted 2016-04-24T18:21:51.627

Reputation: 237

1Its near impossible due to all the code that runs on any given page, code that pulls images and resources from hundreds of other locations not on their web server. I Use Chrome and save as (single file) MHTML, but does not always get everything but seems to be the best for me. – Moab – 2016-04-24T18:35:24.247

you could try wget from a command prompt. It will download whole websites but you can tune it to download only what you want. – sdjuan – 2016-04-24T18:40:13.777

The problem that ultimately cannot be addressed, is code running on the server. Most web application server runtimes execute code on HTTP GET and POST, and that code is never transmitted to the client; only its output. What you want to do is only possible if the site is written to execute entirely client-side (usually via javascript), and consumes no external data. – Frank Thomas – 2016-04-24T18:42:22.207

1@FrankThomas thanks for the insight. The only thing is, because the website is able to run perfectly when I disconnect my Internet, shouldn't it be entirely possible to run it without an Internet connection and save it? That's what I keep on thinking. – Skeleton Bow – 2016-04-24T18:51:07.267

@sdjuan unfortunately wget doesn't seem to work. Does it work for you? – Skeleton Bow – 2016-04-24T18:57:44.533

sure works for me. "doesn't seem to work" doesn't give a whole lot to go on. Here's the start of the man page from a terminal on my mac: WGET(1) GNU Wget WGET(1)

NAME Wget - The non-interactive network downloader.

SYNOPSIS wget [option]... [URL]...

DESCRIPTION GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies. – sdjuan – 2016-04-24T19:10:00.467

@sdjuan sorry about that. I am actually very inexperienced with the terminal and I decided to Google "wget" online. The first website that came up let you input the address and then output the HTML text of the webpage. Eisen save this as an HTML file and then proceeded to open it in Safari, and it was pretty much messed up. Do you suggest me to use the terminal instead? I'm not sure if I can do it because I know so little about it. However, you were saying that you actually succeed in with this really wants to makes me want to try somehow… – Skeleton Bow – 2016-04-24T19:12:31.487

with wget you may have to script it to download things that are just links in a page. Not all content is local to the website. Be careful what you ask for as one link can lead to another and pretty soon you may have to download the whole internet http://www.w3schools.com/downloadwww.htm

– sdjuan – 2016-04-24T19:17:43.843

@SkeletonBow Since you are not proficient with terminal, I withdraw my suggestion since wget may be way too hard for you. Sorry I can't give a canned way for you to do what you want. Good luck – sdjuan – 2016-04-24T19:20:36.373

Let us continue this discussion in chat.

– Skeleton Bow – 2016-04-24T19:52:08.923

A similar question: http://superuser.com/questions/577102/save-website-containing-javascript-after-it-was-interpreted

– That Brazilian Guy – 2016-04-24T20:19:41.103

@ThatBrazilianGuy thanks for the link. I read the answer but it didn't seem to work :(. This has me pulling my hair out! – Skeleton Bow – 2016-04-24T21:03:03.913

I use Firefox MAFF to save pages that will display perfectly, but I never needed to use it to save a page that runs client-sided scripts, so it might not be what you need. Anyway, it's an amazing tool and nice to have :)

– That Brazilian Guy – 2016-04-25T01:37:57.937

Answers

1

It's not possible to do this with many websites these days. And for sites that seem like it's possible, it would still require some Javascript experience for reverse-engineering and "fixing" the scripts that are saved to your computer. There is no single method that works for all websites, you have to work through each unique problem for every site you try to save.


A lot of websites are no longer just static files that are sent from the server to your computer. They have become 2-way interactive applications, where the web browser is running code that continuously interacts with the web server from the same page.

When you load a website in a browser, you are seeing the "front end" of the entire system that makes up the website. This "front end" (including the HTML, Images, CSS, and Javascript) can even be dynamically generated by code on their end! Which means there is code executing on the server side that is not sent to your web browser, and that code may be critical to supporting the code that is sent to your web browser.

There is simply no way to "download" that server-side code, which is why many websites don't work properly when you save them.

The most common problem causing things to break is that websites use javascript to load content after the initial page response is sent to your browser. The HostMath site you are trying to save offline definitely uses a back-end to retrieve javascript files that are critical to the site's functionality. In Firefox I get this error for several different javascript files when I try to open the site locally:

Loading failed for the <script> with source “file:///D:/Home/Downloads/hostmath
/HostMath%20-%20Online%20LaTeX%20formula%20editor%20and%20browser-
based%20math%20equation%20editor_files/extensions/asciimath2jax.js?rev=2.6.0”

See that ?rev=2.6.0 after the filename? That is a parameter that is passed to the back-end (webserver) to determine which asciimath2jax.js file should be sent to your web browser. My D: drive isn't a web server, so when Firefox is trying to load a file with a URL parameter, it fails.

You could try downloading the file from HostMath manually and save it in the right location without the ?rev=2.6.0 though. Then you would need to change the site's scripts and HTML to load the file from your drive without a URL parameter. This would have to be done for all of those scripts that failed to load.

You will hit a dead-end if there is any Javascript that makes requests to a web service (an API) on the host website though. This would be done to off-load computation for something that the site doesn't compute locally in the web browser, which means the back-end is essential to running the front-end.

Romen

Posted 2016-04-24T18:21:51.627

Reputation: 972

0

Open the website that you want to shop. Any web browser can speedy shop the web site which you are currently visiting. ... Open the "Save web page as" window. ... Give the saved page a call. ... Select a vicinity to store the page. ... Select whether or not you need the entire web page or just the HTML. ... Open the saved webpage.

mandar shewale

Posted 2016-04-24T18:21:51.627

Reputation: 1

The OP already mentioned that this doesn't work. – RalfFriedl – 2019-09-29T10:21:34.110

-1

1 Clear the cache memory of the browser u are using 2 Open browser and go on the site u want to download 3 Open the folder of the cache 4 Download all files into the same folder where the index.html/xml/etc.. is 5 Go offline and test the downloaded page

Alex Shandor

Posted 2016-04-24T18:21:51.627

Reputation: 1