10

Suppose I have an application which does encryption using SSL and provided you cannot control what cipher suite is being negotiated, and assuming that I have some custom compression over the data before the encryption takes place. What would be the best mode of compression to be used? If I go ahead and use DEFLATE/GZIP am paranoid I might expose myself to an attack where an attacker can expose encrypted data using chosen plain text attacks. (something similar to the CRIME attack)

Cookies
  • 203
  • 2
  • 7

1 Answers1

26

Encryption hides data but leaks data size. That's a common property of all encryption systems: encrypted data has (more or less) the same size as the clear data.

Compression alters the data size but it does so by finding "redundancies" (in a loose sense) in the data. So compression success rate depends on the data contents. Thus, compression makes data size dependent on data content.

Take the two together, and this leads you to the unavoidable but grim conclusion: compression leaks information about the data contents, even through encryption. The only general conclusion is then that thou shallt not compress at all.

The CRIME attack simply works with that notion, in a Web-specific setup where the attacker gets to choose part of the data which is compressed, and tries to obtain some other confidential data which is also part of the compressed stream. This is a chosen plaintext attack and it makes the attack really efficient. However, in all generality, a leak is a leak: even in purely passive attack scenarios, compression will make your connection "less secure".

The principle is not specific to Deflate; changing it to a custom algorithm would not save you. Deflate is not "especially weak" or "especially strong" in that respect.


All of the above applies for all lossless compression algorithms. It doesn't for lossy compression algorithms which achieve a fix data bandwidth. For instance, if you compress music into a MP3 at exactly 128 kbits/s (no variable rate) then the resulting data size will not depend on the music contents, only on the original music duration. But, of course, lossy compression algorithms are applicable only to data where loss can be tolerated; e.g. sound, not XML.

(The thing about "lossy" is that you cannot guarantee a fixed compression rate without having some potential data loss, because of the pigeonhole principle, unless your "compression" is actually a lack of compression, with the input data being totally unchanged.)

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 2
    Thanks for your explanation Tom Leek. Appreciate that. But, your statement on "thou shallt not compress at all." I would like to disagree there. How does protocols like SPDY solve header compression without leaking data length? I read somewhere that the next version of SPDY will handle this securely. – Cookies Aug 01 '13 at 16:04
  • 6
    SPDY _does_ leak data length. Next version will just skip the "sensitive" parts of the header and not compress those (mostly, the cookies), so it will compress less, and leak less. – Tom Leek Aug 01 '13 at 16:47