Making wget's --convert-links respect http vs https

2

I'm using the following command to mirror an https website:

wget --directory-prefix=/tmp/mirror --mirror --no-host-directories \
     --regex-type pcre --reject-regex "$SKIP_REGEXP" \
     --convert-links --adjust-extension --header "Accept-Language: en-US,en" \
     --header "X-Build-Mirror: True" -o /tmp/mirror.log https://logic.ff.cuni.cz

(Actually, the command runs as a single line - I've broken it over several lines for improved readability.)

Per the documentation of the --convert-links flag, links to downloaded files are converted to relative links for local viewing and links to files which are not downloaded (e.g. because of --reject-regexp) are converted to absolute links. However, in the conversion, although the host url is an https url, all absolute links become http links!!

Is this a bug in wget or is there some way to force it to respect the protocol type? (I know that I can use the --https-only flag, but that would prevent getting any http resource.)

jonathanverner

Posted 2018-08-14T10:49:48.950

Reputation: 121

This is almost definitely a bug in Wget. I'll open a bug-report on your behalf. – darnir – 2018-08-16T11:30:09.530

No answers