Mirroring a web site behind a login form

9

6

Short version:

  • I'd like to snap an offline copy of a URL which includes the full HTML+CSS+JS+images and saves them locally, keeping the structure and file content of the original site.
  • I'm having trouble with the tools I can find (e.g. "Save Complete" Firefox extension, HTTrack, wget, Teleport Pro) partly because the URL is behind a login form.

Longer version:

When working on my app I often want to snap an offline full HTML+CSS+JS+images version to send to the designer I work with, who makes modifications and sends it back. I then apply the changes to the app.

This has turned out to be much more efficient than having him/her navigate our code with a live app, but there's one snag - I can't find a mirroring app that's convenient.

Firefox extensions like "Save Complete" have the login cookie already so don't care that they're behind a login form, but they mangle the locally-saved files making it impossible to work with them.

Mirroring tools like wget or Teleport Pro don't support our login form.

HTTrack, though, is supposed to be able to run in proxy mode to detect the login info, but I could never get it to work. As a fallback it can accept cookies that I hard-wire into its cookies.txt file, but it always takes me hours to do this reliably.

Any tools, browser extensions, etc. that could do this? Open source, commercial - anything. If I've been misusing HTTrack and it's actually trivial to do — that's a great answer as well.

orip

Posted 2010-06-27T13:40:57.090

Reputation: 295

Answers

7

With HTTrack you can have it uses a cookies.txt file when downloading. I've used it to successfully mirror a moodle site.

TheLQ

Posted 2010-06-27T13:40:57.090

Reputation: 2 738

Thanks - I've done that before with HTTrack, but for some reason it always take me several tries to get it to work, although I can't see a reason why. Did you ever get the built-in forms authentication support to work? It never worked for my site. http://httrack.kauler.com/help/CatchURL_tutorial

– orip – 2010-06-28T08:56:21.753

2What I did is logged in with my browser and exported the cookies.txt file, added the logout page to the blacklist, and let it run. Took me a few tries due to moodle's stupidity, but I got it to work – TheLQ – 2010-06-28T18:20:21.463

5

I've done this successfully with WinHTTrack. You can follow the normal procedure for capturing a website, with two minor settings tweaks:

  1. In Chrome, open Dev Tools, then login to the website you need to capture. In the Network tab, click on the HTML page you requested to find your session cookie (the name of this will differ depending on the back-end framework used). Place this into HTTrack under "Additional HTTP Headers".

  2. Also ensure your user agent string matches, as sometimes sessions are blocked if the user agent string is changed.

    Session cookie login into HTTrack

  3. Start downloading the site. The result should be just as if you're logged in.

Simon East

Posted 2010-06-27T13:40:57.090

Reputation: 2 414

3

Have you tried Offline Explorer ?

I remember something like it would let you to login, thus saving cookies for consequent requests and will do the rest. Not sure for 100% as I was using it long time back.

Pablo

Posted 2010-06-27T13:40:57.090

Reputation: 4 093

Awesome, seems like the Pro version supports POST into forms, I'll check it out – orip – 2010-06-27T15:23:10.713

It took me a while to figure it out, and the documentation was sparse, but I realized that I could use the embedded IE-based browser to log into the form and then choose "Add the next clicked link as a project". The "autosave" feature was nice as well, although it messed up my form post occasionally. Gonna eval it for 30 days, could be what I'm looking for. – orip – 2010-06-27T15:48:30.210

2

Teleport Pro allows for a login and password to be used.

As you start a New Project Wizard you'll come to a point where it gives you that option (I think it's in the 3rd screen of options).

And even if you miss it you can access that option again.

In the main window (after have gone through the Project Wizard) right click your project (little folder icon displaying the URL you're trying to download, on the left pane) and chose the last option Starting Address Properties and you're presented with an options screen where you can specify a user login and password to be used in that site.

Helper

Posted 2010-06-27T13:40:57.090

Reputation: 21

This is an ancient question, but teleport pro supports http auth, not entering data into POST forms. – Fake Name – 2016-08-05T04:56:10.007