53

If we want both encryption and compression during transmission then what will be the most preferable order.

  1. Encrypt then compress
  2. Compress then encrypt
Ali Ahmad
  • 4,784
  • 8
  • 35
  • 61
  • Do not forget integrity protection. In most settings it is as important as encryption. It can be done with MACs (for example [HMAC](http://en.wikipedia.org/wiki/Hmac)) in a symmetric setting or signatures in an asymmetric setting. – Perseids Sep 10 '12 at 19:52
  • That looks like a question from the coursera cryptography course's exams. – Zzz Sep 11 '12 at 04:09

5 Answers5

72

You should compress before encrypting.

Encryption turns your data into high-entropy data, usually indistinguishable from a random stream. Compression relies on patterns in order to gain any size reduction. Since encryption destroys such patterns, the compression algorithm would be unable to give you much (if any) reduction in size if you apply it to encrypted data.

Compression before encryption also slightly increases your practical resistance against differential cryptanalysis (and certain other attacks) if the attacker can only control the uncompressed plaintext, since the resulting output may be difficult to deduce.

EDIT: I'm editing this years later because this advice is actually poor in an interactive case. You should not compress data before encrypting it in most cases. A side-channel attack method known as a "compression oracle" can be used to deduce plaintext data in cases where the attacker can interactively cause strings to be placed into an otherwise unknown plaintext datastream. Attacks on SSL/TLS such as CRIME and BREACH are examples of this.

Polynomial
  • 132,208
  • 43
  • 298
  • 379
  • 28
    However, compression might allow new attacks in some [contexts](http://security.stackexchange.com/a/19914/655). – Thomas Pornin Sep 10 '12 at 12:21
  • 1
    Wow, never even knew about that attack. Great writeup, too! – Polynomial Sep 10 '12 at 12:34
  • Attempting to compress encrypted data is a way of testing the diffusion property of the encryption algorithm, it's also a test for key material; neither should compress at all in a perfect world. – lynks Sep 10 '12 at 13:43
  • 2
    @lynks It is not, however, a definitive test of randomness. If the encrypted file does not compress, your cipher isn't a complete joke, but may still very well be insecure in the extreme. If the encrypted file does compress, all hope is lost and you may as well hand over the plaintext to the bad guys. – Thomas Sep 10 '12 at 15:15
  • Also add to that true randomness should contain coincidental patterns from time to time, so in a large enough sample one should be able to compress a few bytes here and there anyway. – ewanm89 Sep 10 '12 at 15:19
  • @ewanm89 Potentially. In reality, you need reasonable runs in order to compress, and most algorithms have some sort of metadata overhead per "block" of compressed text. As such, you're unlikely to even break even. – Polynomial Sep 10 '12 at 15:33
  • 4
    @ewanm89: The number of possible compressed messages of length *n* cannot be greater than the number of possible messages of length *n*. So, if we average over the set of all possible messages, the average compression ratio (compressed size divided by uncompressed size) cannot be less than 100%. Compression algorithms achieve real-world compression ratios of less than 100% by targeting common patterns at the *expense* of uncommon ones; so, a truly-randomly-generated message will usually have a compression ratio of greater than 100%. – ruakh Sep 10 '12 at 19:18
  • 1
    @ruakh only when you counter in the metadata as Polynomial says, yes, it's unlikely that the pattern will repeat enough for the dictionary to be larger than the amount of compression in the actual data. Of course a block cipher in ECB mode tends to be highly compressible but then it tends not to give random output and is open to dictionary attacks. – ewanm89 Sep 10 '12 at 20:05
28

If you compress after encryption and the compression does any good (i.e. it really reduces the length by a non-negligible amount) then you can ditch the encryption, it is awfully weak. Encrypted text ought to be indistinguishable from randomness; even badly encrypted data cannot usually be compressed.

Therefore, compress before encryption. This is why protocols which deal with encryption usually include some support for compression, e.g. OpenPGP (section 5.6) and SSL/TLS. In some scenarios, compression can leak information about confidential data (because compression reduces length depending on the data, and encrypted length more or less matches plaintext length); this is the idea behind the new CRIME attack on SSL/TLS.


Fringe exception: if you encrypt a message with OpenPGP and then "ACSII armor" the result, i.e. encode it in Base64, then this encoding enlarges the data by 34%: 3 bytes become 4 characters (plus the odd newline). Compression with DEFLATE will be effective at cancelling this enlargement (thanks to Huffman codes). That's a case of usefulness of compression after encryption -- but, really, that's more compression over Base64, rather than compression over encryption.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
8

I would recommend to first compress the data and than encrypt it.

  1. The compression algorithm might benefit from the knowledge of the data structure and that structure would be disguised by the encryption. An example would be mp3 which can only compress sound data.

  2. you would have to encrypt less data. While when you first encrypt and then compress you would gain no speed up.

Raphael Ahrens
  • 323
  • 2
  • 12
4

Neither: Compress during encryption with an encryption tool designed to do both securely, such as GPG/OpenPGP.

This is basically Thomas Pornin's answer just more direct, so readers in a hurry don't misunderstand the subtleties of what Thomas Pornin explains in his answer. The question expresses a false dichotomy. If the OP (and the reader) is thinking of the first and second steps being the execution of two different tools like gzip and gpg:

  1. If you encrypt first, compression won't do much, besides squeeze out the Base64 34% inflation of "ASCII armor" that @ThomasPornin mentioned.

  2. If you compress first, the encryption is less secure, vulnerable to attacks like the ones that @ThomasPornin mentioned.

hobs
  • 161
  • 6
2

Compression after encryption may not do the actual function of compressing data because it is not going to reduce size much. But encryption after compressing is reduce size but it will not perform the correct functioning of encryption because attacks like CRIME can be happen.

As an example in web requests headers contain secret web cookies therefore compression of headers before encryption will reveal those secret information to out side.

Therefore it is wise to do selective compression which compress only non secret data in page and then encrypting will make sense and it will prevent the extract of secret information.

Dinithi
  • 141
  • 5
  • 1
    Depends a bit on the data really. Compression certainly can leak information, but it depends on the application / protocol if this kind of information is sensitive. Note that even uncompressed data leaks information about the size of the plaintext message. However, compression can leak information about the contents of the plaintext message as well, as some data is easier to compress than other data. – Maarten Bodewes May 25 '20 at 15:45