Scripts that extend wget --page-requisites?

2

I have posted a very similar question today but I would like to ask again with a wget/linux focus. I hope that's all right.

I need to create offline copies of web pages programmatically on a LAMP stack, preferably using PHP. I need the HTML source, attached images and CSS style sheets.

I can run things on the command line, but not install new packages.

I can do a wget --page-requisites on the pages I want to archive. This downloads everything I need, but it does not modify the downloaded HTML and CSS files to point towards the archived files.

I am looking for an extension / Perl script / shell script that modifies the downloaded document(s) to point towards the downloaded resources, or maybe a different, Linux based solution that does this. I already checked, httrack does not seem to be installed on the server ("whereis httrack" returns nothing).

Pekka

Posted 2009-11-29T14:37:08.310

Reputation: 2 239

Answers

9

Try the --convert-links option:

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

Phil

Posted 2009-11-29T14:37:08.310

Reputation: 476

How could I overlook that? I will try it out and report back. – Pekka – 2009-11-29T14:45:57.853