Website downloader (cache?) to view sites offline

9

7

Is there a portable way to download or cache all pages of a website for viewing offline? I have a cross country flight tomorrow, and I'd like to be able cache a few webpages (particularly the python doc page (http://docs.python.org/), and the pyqt reference (http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/pyqt4ref.html).

Ideally I'd like a Firefox add-on or something like that, but anything will work fine as long as I can run it on Linux.

Falmarri

Posted 2010-11-24T03:56:11.977

Reputation: 530

You can try this offline website downloader.

– Menelaos Vergis – 2014-03-14T09:20:24.953

Answers

15

I use HTTrack.

It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer

Edgar

Posted 2010-11-24T03:56:11.977

Reputation: 680

-1 It doesn't build "all directories" it doesn't get all "images and other files".. It only gets what's linked to. – barlop – 2016-05-20T17:17:25.380

I've used this one in the past, nice free solution. – MaQleod – 2010-11-24T05:48:29.743

1HTTrack is the best for both Linux and Windows and there is a huge list of options to configure the downloading process.. I love it – eslambasha – 2010-11-24T10:24:43.840

I used it too; they now offer an Android version too. – gparyani – 2014-01-07T19:19:22.387

6

I use wget with these options to mirror a site for offline use

wget -cmkE -np http://example.com/a/section/i/like

where

-m turns on mirroring options for mirroring a site locally

-c continues a previous download in case I have already downloaded some pages

-k converts absolute href to point to local ones for offline viewing

-E ensures files have .html extension after download.

-np only downloads objects under /a/section/i/ and does not cache the whole site.

For example I wanted to download south documentation but not south tickets, etc...

wget -cmkE -np http://south.aeracode.org/docs/index.html

I use Windows and run wget on cygwin but there is also a native windows wget port.

Although, in your case, you can download python offline docs from python docs section

Meitham

Posted 2010-11-24T03:56:11.977

Reputation: 201

1

Try http://www.downthemall.net/ a Firefox plugin. I've used it to download 250 pages of PDFs in 20+ separate files. It is extremely powerful. It has a wildcard/query syntax that lets you surgically get only the files that you want and none of the irrelevant ones you don'.

Rolnik

Posted 2010-11-24T03:56:11.977

Reputation: 1 457

1

Some Firefox extensions that I know of:

  • ScrapBook

    helps you to save Web pages and easily manage collections. Key features are lightness, speed, accuracy and multi-language support. Major features are:

    • Save Web page
    • Save snippet of Web page
    • Save Web site
    • Organize the collection in the same way as Bookmarks
    • Full text search and quick filtering search of the collection
    • Editing of the collected Web page
    • Text/HTML edit feature resembling Opera's Notes
  • ScrapBook Plus

    Difference between ScrapBook Plus and ScrapBook:

    • faster sorting
    • faster import and export
    • open the window to manage your collection directly from the sidebar
    • simplified the handling of the "Combine Wizard"
    • new features for "Capture Multiple URLs" (filter to exclude links, use title of the web site or title of link as new title for the entry in the sidebar, specify waiting time between to download from 0 to 3 seconds, use UTF-8 or ISO-8859-1 as character set)
    • new "Capture" window (download needs to be started manually, automated scrolling turned off)
    • 6 highlighters in the editor
  • UnMHT

    allows you to view MHT (MHTML) web archive format files, and save complete web pages, including text and graphics, into a single MHT file

  • Pocket (not an extension; a built-in Firefox feature)

    lets you save web pages and videos to Pocket in just one click. Pocket strips away clutter and saves the page in a clean, distraction-free view and lets you access them on the go through the Pocket app.

    Note that:

    Saving to and accessing your Pocket list on Firefox requires an Internet connection. Offline viewing is possible on other devices with the Pocket app.

galacticninja

Posted 2010-11-24T03:56:11.977

Reputation: 5 348

0

You can download a whole website or part of a website with wget.

wget -r http://docs.python.org/

Check the wget manual for other options you may want to pass, e.g. to limit your bandwidth usage, to control recursion depth, to set up exclusion lists, etc.

Another approach to offline browsing is to use a caching proxy. Wwwoffle is one which has a lot of features to facilitate retention for offline browsing, such as overrides to server-specified expiration dates and a recursive pre-fetching capability. (I've been using wwwoffle since my dial-up days.)

Gilles 'SO- stop being evil'

Posted 2010-11-24T03:56:11.977

Reputation: 58 319