10

Can you give me an example of a short data string that, when decompressed using Zlib's DEFLATE method, expands to something much much longer?

More precisely: what is the nastiest decompression bomb that one can build for Zlib's DEFLATE? The figure of merit here is the compression ratio. If the compressed file is n bytes long, and after decompression it yields something m bytes long, then the compression ratio is m/n. I am looking for something that maximizes the compression ratio, where hopefully the compressed data is very short. Can anyone give me an example of such a decompression bomb?


Related: This post claims that DEFLATE can asymptotically approach a compression ratio of 1032; is that the best one can do, or can one achieve a higher compression ratio if we pick a carefully chosen sequence of compressed bytes? Libpng defends against decompression bombs by imposing resource limits, but they don't give a concrete example of a specific decompression bomb. See also zip bombs, the corresponding attack but for the ZIP file format.

D.W.
  • 98,420
  • 30
  • 267
  • 572

2 Answers2

8

The source of the "1032:1" figure is given on the zlib site where it is told that:

The limit comes from the fact that one length/distance pair can represent at most 258 output bytes. A length requires at least one bit and a distance requires at least one bit, so two bits in can give 258 bytes out, or eight bits in give 1032 bytes out. A dynamic block has no length restriction, so you could get arbitrarily close to the limit of 1032:1.

This is a quote from Mark Adler, one of the two designers of zlib. Having myself implemented a library for Deflate, I can confirm that, given how the algorithm works, this asymptotic limit is indeed true (you cannot go beyond, but you can get as close as you wish). Deflate works by encoding "copies": each element is either a new byte, or a copy of a past sequence; copies can overlap (i.e. you can copy a sequence of length 30 at distance 4, yielding 15 times the past sequence of 4 bytes); Huffman codes are applied on the sequence of elements. To get a minimal length "copy" you need it to overlap a lot, and each copy is worth at most 258 bytes. So the data which is to be compressed will have to be a long string of identical bytes.

As @Steffen says, compressing with gzip a long sequence of zeros will yield a more than 1000:1 compression ratio. It does not have to be zeros; any sequence consisting of the same byte value over and over will do the trick; but a Linux machine has a /dev/zero, not a /dev/fortytwo.

"Compression bombs" were already active quite a long time ago; I saw it employed to kill Netscape back in 1996 (at that time, Netscape handled background pictures, but Mosaic did not; a very small GIF file could encode a huge background, which Netscape allocated as a single big pixmap in the X11 server).

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
4

zlib deflate is used by gzip. To create ab 10k bomb out of 10M input you might use

dd if=/dev/zero  bs=1024 count=$((10*1024)) | gzip -9 > bomb.gz

And I think the post is right in claiming that you have a maximum ratio of about 1000. More information are at http://www.aerasec.de/security/advisories/decompression-bomb-vulnerability.html (old) but http://blog.cyberis.co.uk/2013_08_01_archive.html shows that most modern browsers still no have protection against it.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • Why *should* browsers protect against it? Once they start decompressing the file, they'll know if they're going to hit a resource limit, and cancel the operation if it's going to take too much memory. It's not clear to me that clicking on a link and getting a 10k file which expands to 10M is any worse than getting a 10M uncompressed file. – Dan Lenski Jan 03 '16 at 08:21
  • 1
    @DanLenski: if you read the links I have provided you will see that some browsers don't cancel the decompression if it takes too much memory. It does not matter if they would have a protection once they have extracted the 10MB (or 1GB for more bombing). The need to detect the problem while decompressing and not while processing the (decompressed) content. – Steffen Ullrich Jan 03 '16 at 08:41
  • Fair point. It's definitely a bug to only check the space requirement *before* decompression, rather than during it. – Dan Lenski Jan 03 '16 at 17:20