Wget overwites files when mirroring a multilingual website

1

I would like to mirror a website using wget. The problem is that the website has several language mutations which are switched using a query-string param and when wget starts to download another language version, it clobbers the previous one. For example it starts with index.html, grabs a part of the site, then encounters a link to index.html?lang=foo, starts downloading a new language variant and overwrites the previous index.html with the new one. What can I do when I want all of them?

zoul

Posted 2009-12-15T14:20:40.040

Reputation: 113

Answers

1

As the pages are actually the same, I am not too sure...

You can try HTTrack which is a very flexible website copier and you can configure rules such as exclude paths / pages with a certain query string... Or it may actually be able to download all languages, I am not 100% sure as I have not run in to this problem.

William Hilsum

Posted 2009-12-15T14:20:40.040

Reputation: 111 572

This helped, thank you. Looks like I would be able to download the site one language variant at a time and exclude the links that lead to the other. – zoul – 2009-12-15T14:53:07.443