Compress and then encrypt, or vice-versa?

88

13

I am writing a VPN system which encrypts (AES256) its traffic across the net (Why write my own when there are 1,000,001 others already out there? Well, mine is a special one for a specific task that none of the others fit).

Basically I want to run my thinking past you to make sure I'm doing this in the right order.

At the moment packets are just encrypted before being sent out, but I want to add some level of compression to them to optimize the tranfer of data a little. Not heavy compression - I don't want to max out the CPU all the time, but I want to make sure the compression is going to be as efficient as possible.

So, my thinking is, I should compress the packets before encrypting as an unencrypted packet will compress better than an encrypted one? Or the other way around?

I will probably be using zlib for the compression.

Read more on the Super User blog.

Majenko

Posted 2011-03-15T09:37:59.350

Reputation: 29 007

@JeffFerland, http://crypto.stackexchange.com

– Pacerier – 2015-05-18T18:02:46.483

@Pacerier: Crypto.SE didn't exist at the time this question was asked. – Jeff Ferland – 2015-05-18T19:41:02.100

4Writing as "programming"? Would be better suited for Stack Overflow then. – Suma – 2011-03-15T14:07:15.737

4If I were asking about the programming of it, yes, but I'm not. This is a general compress then encrypt or encrypt then compress question which could apply to just working with plain files if you wanted. The programming side is just context for why I am asking the question. – Majenko – 2011-03-15T14:08:31.667

See also: http://stackoverflow.com/questions/4676095 http://stackoverflow.com/questions/4399812

– BlueRaja - Danny Pflughoeft – 2011-03-15T19:56:22.893

Probably a question best meant for http://security.stackexchange.com/

– Jeff Ferland – 2011-03-16T14:32:56.443

1They know about compression there do they? – Majenko – 2011-03-16T14:59:15.350

@Majenko - They know about encryption, and most of them would know the answer is compress then encrypt. Of course they'd ask the question why you're using a block cipher instead of a stream cipher and point out that this will come at a price of speed (and that you should reconsider unless you already thought about it), and that maybe an elliptic curve cipher (http://eprints.usm.my/9413/1/ECSC-128_New_Stream_Cipher_Based_on_Elliptic_Curve_Discrete_Logarithm_Problem.pdf) would better suit. But I digress.

– Everett – 2012-10-09T04:35:00.953

Answers

177

If the encryption is done properly then the result is basically random data. Most compression schemes work by finding patterns in your data that can be in some way factored out, and thanks to the encryption now there are none; the data is completely incompressible.

Compress before you encrypt.

Mr Alpha

Posted 2011-03-15T09:37:59.350

Reputation: 6 391

4

@Olli, Your orange comment there is going to mislead alot of people. It's better to delete it.

– Pacerier – 2015-05-18T18:09:14.047

@Olli, replace "entropy" with "obfuscation" and you may have something :). – user1172173 – 2017-10-17T13:20:50.593

41More important: compression adds entropy. Adding entropy is good for your encryption (harder to break with known-plaintext attacks). – Olli – 2011-03-15T10:52:00.920

8Also, encrypting costs resources, encrypting a smaller file will take less resources. So compress before encrypt. – GAThrawn – 2011-03-15T16:23:33.590

Aren't, conceptually, encryption and compression the same thing? Or rather, if encryption is done properly, (and compression is impossible) then you've really ended up compressing the data. (I guess it depends on one's definition of 'properly') – Mitch – 2011-03-15T16:38:33.250

1No. Compression reduces the file size and can be undone by anyone with the decompression program. Encryption changes the content so that it can only be read by someone with the decryption key - the file size may stay the same, or maybe grow or shrink. – Majenko – 2011-03-15T17:17:53.637

9@Olli - not necessarily if the compression scheme adds known text. In the worst case imagine if it put a known 512byte header on the front of the data and you were using a block mode encryption. – Martin Beckett – 2011-03-15T17:25:07.500

@Martin: yes, that's true, it's not always good idea, I assumed "when doing it properly". – Olli – 2011-03-15T17:29:51.223

26I'm not sure why @Olli's comment would get upvoted, as it is incorrect; not only is it significantly less important, for any half-decent encryption it should be not important at all. That is, the strength of the encryption should be completely unrelated to the entropy of the message. – BlueRaja - Danny Pflughoeft – 2011-03-15T19:51:25.300

8If you compress at all, it can only really be done before encrypting the message, but bear in mind, this may leak information about 'compressability' of the original message, so you'll want to consider if there are any consequences to this side channel. Consider a fixed sized file that is either all 0s or a message. The all 0 file will result in a smaller payload under any reasonable compression scheme. Not likely an issue in this particular use case though. – Edward KMETT – 2011-03-15T20:00:02.017

4@Olli: Compression doesn't add entropy. But it does reduce non-entropy. – user46971 – 2011-03-16T00:10:21.267

22

Compress before encryption. Compressed data can vary considerably for small changes in the source data, therefore making it very difficult to perform differential cryptanalysis.

Also, as Mr.Alpha points out, if you encrypt first, the result is very difficult to compress.

Juancho

Posted 2011-03-15T09:37:59.350

Reputation: 2 187

12

Well, this is correct, but was posted 2 hours before you posted... Entropy

– Konerak – 2011-03-15T16:43:20.810

3

Even if it depends on the specific use-case, I would advise Encrypt-then-Compress. Otherwise an attacker could leak information from the number of encrypted blocks.

We assume a user sending a message to the server and an attacker with the possibility to append text to the user message before sending (via javascript e.g.). The user wants to send some sensible data to the server and the attacker wants to get this data. So he can try to append different messages to the data the user sends to the server. Then the user compresses his message and the appended text from the attacker. We assume a DEFLATE LZ77 compression, so the function replaces same information with a pointer to first appearance. So if the attacker can reproduce the hole plaintext, the compression-function reduces the size of the plain text to the original size and a pointer. And after the encryption, the attacker can count the number of cipher blocks, so he can see, if his appended data were the same as the data the user sent to the server. Even if this case sounds a little bit constructed, it is a serious security issue in TLS. This idea is used by an attack called CRIME to leak cookies in a TLS connection to steal sessions.

source: http://www.ekoparty.org/archive/2012/CRIME_ekoparty2012.pdf

Tobias Braun

Posted 2011-03-15T09:37:59.350

Reputation: 31

2

My view is that when you compress a message you project it to a lower dimension and therefor there are fewer bits, which means that the compressed message (assuming lossless compressioon) has the same information in fewer bits (the ones you got rid were redundant!) So you have more information per bit and consequently more entropy per bit, but the same total entropy as you had before when the message was not compressed. Now, randomness is another matter and that is where the patterns in compression can throw a monkey wrench.

Prof

Posted 2011-03-15T09:37:59.350

Reputation: 21

1

Compression should be done before encryption. a user doesn't wants to spend time waiting for the transfer of data , but he/she needs it to be immediately done without wasting any time.

sqlchild

Posted 2011-03-15T09:37:59.350

Reputation: 129

1

Compression before encryption as has been pointed out earlier. Compression looks for structure it can compress. Encryption scrambles the data so as to avoid structure being detected. By compressing first you're much more likely to have a smaller file and thus less payload to transfer. Encryption is going to do it's job regardless if it's compressed or not and, again as pointed out earlier, is likely to be more difficult to perform differential cryptanalysis on a compressed file.

Always Learning

Posted 2011-03-15T09:37:59.350

Reputation: 11

This appears to be a repeat of the accepted and second answers. Each answer should contribute a substantively new solution to the question. – fixer1234 – 2015-06-19T20:24:59.830

0

Compression reduces information entropy. Maximum compression makes entropy minimum. For a perfectly encrypted data (noise) maximum and minimum entropy is the same.

AbiusX

Posted 2011-03-15T09:37:59.350

Reputation: 127

2Wait, don't you have that backwards? I thought entropy increased as redundancy decreased. Therefore compression should increase entropy. – Zan Lynx – 2011-03-16T20:06:16.470

Nop, less entropy = more patterns. Randomness has most entropy. – AbiusX – 2011-03-16T20:11:35.060

1But it is information entropy so it is all about meaning. Randomness doesn't mean anything so it doesn't apply. An English sentence can have letters changed and still mean the same thing so it has low entropy. A compressed English sentence might be unreadable if a single bit changes so it has the most. Or so I think. – Zan Lynx – 2011-03-16T20:30:03.553

Entropy is not about sense and ability to read or understand, its all about patterns. Compressed files are full of patterns. – AbiusX – 2011-03-16T22:45:45.787

1@AbiusX: Right. Patterns. And the fewer patterns, the more entropy. Which means that compression which replaces all repeated patterns with a single copy increases entropy. – Zan Lynx – 2011-03-16T23:58:08.583

no its not about quantity. Lots of patterns is not good. Quantity increases entropy. Its all about quality. – AbiusX – 2011-03-17T00:41:30.370