4

Zip files, GZip files, and likely others, include information about the contained file, including the uncompressed size of the file. However, when extracting these files the number is meaningless as the actual data can be much larger (eg: reported as 1 byte, but actually 10GB).

Since decompressing a large .zip file could DoS a system by using up all the resources, you would need some sort of checks in place to prevent extracting a file that could cripple the server. One method would be to limit the allowed size to X GB. If the stated size of a file is greater than that, exit early. However, if the size is stated wrong, you'd need to decompress up to that size before exiting.

A potential alternative is to start decompressing, and as soon as the actual size is larger than the stated (perhaps with a margin of error), exit.

Are there any legitimate cases where the stated size of a compressed file in an archive (zip, gzip, or others) would be wrong? Is there any popular software that produces such broken files?

Tarka
  • 141
  • 3
  • I can't answer your actual question, but an easy way to protect yourself against huge uncompressed files is to create a small partition (say, a few gigabytes) and extract the file in that partition. This will automatically produce a file system error in the decompression program (no space left on device) and abort it if a file uncompresses to a size larger than the space left on the partition; you can then delete the file to make room on the partition again without impacting the operation of the rest of the machine. You can use several such partitions if necessary. – Out of Band Feb 03 '17 at 20:04
  • 1
    The attack you are describing is called a zip bomb. You can protect against it by using a stream and stop if it is too big. I think the headers should be right in any not manipulated zip, but there are formats that don't have the size in the header. – Julian Feb 04 '17 at 14:50

1 Answers1

2

The uncompressed size of each file is stored in the file metadata for no reason other than allowing an application to know things like the compression ratio. It is not meant to be authoritative in any way. There are no legitimate uses which I can think of for having the reported uncompressed file size differ from the actual uncompressed file size.

What you are describing is a zip bomb, also called a decompression bomb. They do not work by lying about the uncompressed file size, but rather by compressing many archives inside of each other. A single file with a gigabyte of zeros will compress quite small. Duplicating an archive many times and then compressing them together will not result in a significant growth in the final archive due to the high redundancy. A 20 byte zip bomb that extracts to a terabyte can be duplicated a hundred times, then archived together and compressed, resulting in a 25 byte zip bomb. This process can be repeated many times until the total size makes it impractical to extract. This was a common technique in the past used by malware, as malware scanners would attempt to blindly peek into archives. When the archive had a zip bomb, they would either run out of memory and crash, hang, or give up and skip it and not find the malicious payload hidden somewhere inside. Now days, malware scanners will often flag suspected zip bombs to allow the end-user to take appropriate action.

An easy way to mitigate these attacks is to impose a limit on how many levels deep a decompressor will go when extracting a compressed archive. Attempting to verify that the reported and actual uncompressed size match is useless as zip bombs are not dishonest about their size. The confusion stems from the fact that only the size of that one archive is recorded. There is no way for a zip file to know that the 100 files inside it are also zip bombs.

forest
  • 64,616
  • 20
  • 206
  • 257