Mirroring a web site having pages that uses simple JavaScript

1

There are simple and great web site download tools that allow to create a local mirror of simple sites having no JavaScript infrastructure behind. However, sometimes there cases when a site looks simple (and even is old enough to be a real "web 1.0" site), but none of such download tools are able to make a mirror of it. I've tried to make a local mirror for this fan site (for example, it has some transcriptions that cannot be found elsewhere, and some more rare stuff) using HTTrack Site Copier, Free Download Manager, and wget. All of them made simple mirrors lacking a lot of content. I saw some scripts at those pages, and probably that prevents from mirroring.

How can I mirror a web site with simple JS enabled? Perhaps, there's a web browser-driven solution that could support JS out of box (any browser extension, etc)?

Thanks in advance.

Lyubomyr Shaydariv

Posted 2014-04-04T07:57:24.447

Reputation: 665

Question was closed 2014-05-28T04:30:54.260

Apologies, but what do you mean by mirror a website? – Dave – 2014-04-04T10:20:49.527

1

@c0dev not sure if the possible duplicate is able to help, because that solution relies on wget. I can't make wget work to satisfy my needs. However, I've seen http://wget.addictivecode.org/FeatureSpecifications/JavaScript in that question, and I'll check it later.

– Lyubomyr Shaydariv – 2014-04-04T10:25:21.593

@DaveRook I mean "to make an exact (more or less) copy of a web site". Please sorry if my English is not fine. – Lyubomyr Shaydariv – 2014-04-04T10:26:49.613

Ah I see - no, your English is great, I just wasn't sure. Why not just try saving the site to your computer? In IE you can save entire site – Dave – 2014-04-04T10:30:06.573

@DaveRook thank you. :) As far as I know, all major browsers can save single webpages only, but not entire web sites. The problem with that site is that it uses, as far as I see, some JavaScript to load content dynamically. I just want to make a recursive copy of the site (like web crawlers do), but unfortunately none of those tools can do that (at least for what I tried) in this case. – Lyubomyr Shaydariv – 2014-04-04T10:36:20.530

@DaveRook I have IE 10 installed, and the most similar feature is "Web Archive, single file". It saved a 28 KB file. From the mirroring point of view it's almost "nothing". Or, perhaps, I'm missing something. – Lyubomyr Shaydariv – 2014-04-04T10:41:01.153

Answers

0

In this particular case I've ended up with the following bash script:

#!/bin/bash

DOWNLOAD="wget -m -p -E -np -k"
SITE="http://homepage.tinet.ie/~themma"

$DOWNLOAD $SITE/
$DOWNLOAD $SITE/songs/
$DOWNLOAD $SITE/songs/songs.html
$DOWNLOAD $SITE/songs/disco.html
$DOWNLOAD $SITE/links/
$DOWNLOAD $SITE/other/
$DOWNLOAD $SITE/tour/

for i in `seq 1 8`;
do
    $DOWNLOAD $SITE/images/bar_0$i.gif
    $DOWNLOAD $SITE/images/bar_0$i-over.gif
    $DOWNLOAD $SITE/images/bar_0$i-bar_03_over.gif
done

for i in `seq 1989 2003`
do
    $DOWNLOAD $SITE/images/$i.gif
done

I couldn't simulate JavaScript behavior, sure, but since the site is extremely simple, that's not much to analyze it to write a wget-based shell script. It's a little tricky, but it works. Thanks everyone for suggestions.

I don't mark this answer as the best one, because my answer implements a particular case only. So any ideas regarding the more general case is really welcome (any "intelligent" cmd-line tools, browser extensions, etc).

Lyubomyr Shaydariv

Posted 2014-04-04T07:57:24.447

Reputation: 665