2
I'm using the following command to mirror an https website:
wget --directory-prefix=/tmp/mirror --mirror --no-host-directories \
--regex-type pcre --reject-regex "$SKIP_REGEXP" \
--convert-links --adjust-extension --header "Accept-Language: en-US,en" \
--header "X-Build-Mirror: True" -o /tmp/mirror.log https://logic.ff.cuni.cz
(Actually, the command runs as a single line - I've broken it over several lines for improved readability.)
Per the documentation of the --convert-links
flag, links to downloaded files are converted to relative links for local viewing and links to files which are not downloaded (e.g. because of --reject-regexp
) are converted to absolute links. However, in the conversion, although the host url is an https url, all absolute links become http links!!
Is this a bug in wget
or is there some way to force it to respect the protocol type? (I know that I can use the --https-only
flag, but that would prevent getting any http resource.)
This is almost definitely a bug in Wget. I'll open a bug-report on your behalf. – darnir – 2018-08-16T11:30:09.530