How to get uncompressed content when using recursive wget?

6

1

i am downloading many single pages with all static content (js, css, imgs...) via wget recursive. It showed up, that served content, which was compressed (gzip), is stored by wget in compressed form. But I want uncompressed form. It is not easy to imagine writing another script which would go through dirs recursively and trying to uncompress what is possible. So is there any way to get it uncompressed?

CMD:

wget -E -H -k -K -p https://some.example

even --header='Accept-Encoding: ' (telling server to not use gzip) did not help.

Thank you for advices :)

user3720773

Posted 2015-10-17T18:14:03.583

Reputation: 61

1I have never experienced anything like what you are describing. Can you provide a specific example URL and exact wget invocation that behaves this way? – a CVn – 2015-10-17T18:49:05.673

example is https://https://www.divokekmeny.cz/, which will made compressed file located at: '..\dscs.innogamescdn.com\merged\index.css@39e9148320b8ea5332396a46c9c05ccd'. When you try to decompress it using gzip, it works.

– user3720773 – 2015-10-17T20:28:47.757

Answers

1

  1. Use httrack instead of wget
  2. Setup decompression proxy. Squid with some 3rd party plugin should be able to do that. I'm more familiar with Java so I used LittleProxy, overrode method getMaximumResponseBufferSizeInBytes() and that was it. I wrote about the later here.

EDIT: Wget 1.19.2 introduces Add gzip Content-Encoding decompression (and it works)

moneytoo

Posted 2015-10-17T18:14:03.583

Reputation: 119