Make wget convert HTML links to relative after download if -k wasn't specified

33

4

The -k option (or --convert-link) will convert links in your web pages to relative after the download finishes, such as the man page says:

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

So, if I didn't specify -k, can I run wget again after the download and fix that, and if so, what would be the proper command? My guess is wget -c [previous options used] [url] and run it in the same working directory as the file were downloaded to.

Nathaniel

Posted 2009-12-07T20:52:12.163

Reputation: 3 966

1you could certainly post-process the files after download, but i don't know if wget does this. your idea of trying it with -c is a good one. time to experiment! – quack quixote – 2009-12-07T21:08:31.177

Have a utility handy to convert the links, by any chance? Running on Windows, by the way... – Nathaniel – 2009-12-07T21:14:49.963

perl ... no prewritten script, but if i wanted a DIY solution that's what i'd use – quack quixote – 2009-12-07T21:48:57.337

Okay, thanks. Don't have Perl installed and it would take too long to grab it. Fortunately, I found how to make wget do the job. I posted an answer. – Nathaniel – 2009-12-07T21:52:25.817

btw, ActivePerl is around as a windows perl port; it's a fairly small installer, and i'm pretty sure most CPAN modules work with it. http://www.activestate.com/activeperl/

– quack quixote – 2009-12-08T15:47:21.223

Answers

24

Yes, you can make wget do it. I'd say use wget -nc -k [previous options] [previous url]. -nc is no-clobber. From the man page:

When −nc is specified, this behavior is suppressed, and Wget will refuse to download newer copies of file.

And the -k option does the link converting. So, wget starts digging in the remote server, sees all the files you already have, refuses to redownload them, and then edits the HTML links to relative when it's done. Nice.

Nathaniel

Posted 2009-12-07T20:52:12.163

Reputation: 3 966

1

In case anyone cares, I built a docker image for wget 1.12: https://hub.docker.com/r/berezovskyi/wget1.12/

– berezovskyi – 2017-12-03T11:54:49.540

3No this doesn't work for me. He download the first file (e.g. index.html), see that is allready downloaded an stop. If you want wget working recursive you have to use the timestamp (-K) option. So wget must request all headers to match if the file is newer or not. – None – 2011-07-10T21:51:06.990

12GNU Wget 1.13.3 built on darwin11.1.0. Trying to use both options at the same time gives Both --no-clobber and --convert-links were specified,only --convert-links will be used. – Ludovic Kuty – 2011-12-29T04:05:57.160

2didn't your question ask for without -k? – barlop – 2012-01-21T01:34:12.413

8

Cf. @LudovicKuty's comment -- as of wget 1.13 --no-clobber doesn't work with --convert-links. See http://savannah.gnu.org/bugs/?31781 for details.

– David Moles – 2013-02-26T20:37:53.377