Which is more effective and 'secure': Compression+Encryption, or only Encryption?

Question

This came up in an IRC discussion on freenode, and now I"m curious.

The idea is that someone wants to determine whether or not encryption alone, or compression+encryption (with gpg/pgp being the primary encryption system), provides better 'security' of the files.

Since I am not an expert in cryptography or cryptographical security, I'm wondering whether there's anything to back up either option over the other.

_{(Because nothing is truly 'secure' I'm using single quotes here)}

General question to all answerers: Assuming we take [the answer here](http://security.stackexchange.com/questions/19969/encryption-and-compression-of-data) as implied, does CRIME/BREACH or compression add any vulnerability? At that point, is encryption alone or is encryption+compression more superior? — Thomas Ward, Feb 16 '15 at 16:48
I will give this a couple days for more answers before accepting one, thank you all! — Thomas Ward, Feb 16 '15 at 16:53
As per my answer, I feel that unless your compression algorithm is unique (IE only ever used by you) then there is very little to no improvement in overall security. — Matthew Peters, Feb 16 '15 at 16:54
CRIME/BREACH are active attacks: the attacker must be able to add things to the message before encrypting, then observe the ciphertext length. In this scenario, encryption alone is superior, but you've got much bigger problems if someone can tamper with your messages. Tiny summary of CRIME: you know someone's going to send either ORANGE or YELLOW. You stick ORANGE on the end so the message is either ORANGEORANGE which compresses to "2xORANGE" (8 characters) or YELLOWORANGE which does not compress (12 characters). — , Feb 16 '15 at 19:49

score 6 · Accepted Answer · 2015-02-16T19:42:47.730

There are several factors that come into play. The first is that encryption should hide patterns: if you have a file that contains the same content repeated several times over, encryption should hide this fact. If you have a weak encryption scheme, that leaks some information about patterns in the plaintext in its ciphertexts, then compression can help reduce the amount of information that leaks through. At least, that'st the classical theory. Any encryption scheme that's good enough to be used nowadays should take care of patterns all by iself though, compression or no compression (related tip: never use ECB mode directly). So the classical argument for compression before encryption is no longer that relevant (and compression after encryption is pointless in just about every case).

Another "classical" argument is that the chances of breaking a cipher increase with the amount of ciphertext you have to work with. So if you compress your messages before sending, you give an attacker less material to work with. Again, for any cipher secure under modern notions you should not have to worry about this. You should update your algorithms and key sizes every now and then according to the latest recommendations, rotate keys etc. - but the impact of compression on this will be minimal. Also if you encrypt each message under a per-message key and then transport this key bey encrypting it under a key-encrypting-key, which is a pretty standard thing to do (especially when public-key crypto is involved), you're controlling how much ciphertext gets sent out under any one key far better than you could by fiddling with compression.

The next point is that you might want encryption to hide the length of your ciphertext to some extent too. It's not too hard to gain some "metadata" about what someone is doing on a TLS connection by observing packet lengths and frequencies. Compression before encryption can help reduce the strength of the "signal" in some cases, but if you're worried about traffic analysis then you probably should be doing other things like padding to a fixed length - again, compression can make a bad situation a bit less bad, but it's not going to turn a bad situation into a good one.

Finally, you may have heard that compression within the TLS protocol will be abolished and banned in v1.3, or at least that was the case when I last read the draft. This is not to do with compression in general but because the specific way encryption, padding, MACs and compression interacted in TLS 1.2 and earlier standards had some vulnerabilities (CRIME, BEAST etc.). That doesn't mean that you can't compress at a higher (application) level or that compression weakens encryption in general, just that it has no place in the actual encryption module, at least not in the way it's done in TLS at the moment.

EDIT

In response to comments asking further questions, here are two examples of how compression can help or hurt you.

Example 1 You're applying for a job and you agree to send your best friend an encrypted message telling you if you got it or not. Because you work in IT, a plain "yes" or "no" will not do - it has to be an animated GIF, so you prepare two files yes.gif and no.gif, and because you're worried about someone just looking at the filesize of your ciphertext you make them both exactly 64KB. But the yes.gif is full of colours and ponies and rainbows and when compressed, 60KB large - the no.gif is a more demure slow black/white animation and compresses down to 4KB. Your encryption program automatically applies compression before encryption. In this case, compression ruins your security completely.

Example 2 You've given up on the animated GIFs and agree that next time you want to meet your friend, you'll give him an encrypted e-mail simply saying "Let's meet next [day of week]." Your encryption program encrypts in 8-character (64 bit) blocks. If your message is "Let's meet next Monday." it fits nicely into 3 blocks, if it's "Let's meet next Wednesday." that's 4 blocks. Failed again. In this case, "mentally compressing" all days into their first three letters ("Let's meet next WED.") would have saved you.

In summary - compression does not generically help or harm encryption; if you typically have to deal with messages of similar lengths but varying levels of entropy/compressability then it might well do more harm than good.

The CRIME attack works because of a set of very specific circumstances that involve compression as one ingredient, but should not be applicable to file storage.

I see you do end up discussing TLS and such - but the idea is OpenPGP based encryption with some compression of files, while TLS is describing file encryption over the network, does this impact encryption at the system itself (WITHOUT network transmission)? — Thomas Ward, Feb 16 '15 at 16:42
There are ciphers that I'd trust with or without compression and there are ciphers that I wouldn't trust in either case; gpg (with sensible algorithms and keylengths) falls into the former catergory. For the scenario of file encryption, if you're not worried about someone measuring the time it takes to decrypt/decompress files and using that to attack you, compression should really be irrelevant. — , Feb 16 '15 at 16:49
It just occurred to me that I never accepted an answer. Oops. Yours is the most comprehensive answer in my opinion, so I've marked your answer as the accepted one. — Thomas Ward, Aug 13 '18 at 16:03

score 3 · Answer 2 · answered Feb 16 '15 at 16:49

3

Compression is not security at all. You may be confusing compression with steganography data hiding but regardless, neither provides true security but instead merely hides things. Encryption of course does secure your data.

From Wiki:

Brute-force attacks can be made less effective by obfuscating the data to be encoded, something that makes it more difficult for an attacker to recognize when he/she has cracked the code. One of the measures of the strength of an encryption system is how long it would theoretically take an attacker to mount a successful brute-force attack against it.

This may be where the confusion starts. You might think that 'oh hey when I use winzip and look at the hex data it all looks different thus I have created a tiny layer of obscurity, right?' Wrong! Unless you are using an unique method of obscurity all you are doing is slowing down the process of checking the brute force attempt by an insanely small amount of time.

This is because every file has what is called a file signature (so that the operating system can make sense of all the data) so unless you either create your own unique file signature or hide your data within some other signature there is no enhancement gained in just changing one signature for another.

answered Feb 16 '15 at 16:49

Matthew Peters

3,592
4
21
39

Indeed, Matthew, I know compression is not security, but lets do include the answer as if i'm a totally ignorant individual. This answer gets a +1 from me because it adds insight into what the other side of the argument was coming from. – Thomas Ward Feb 16 '15 at 16:51
@ThomasW. I mean no offense, but I always answer people on the assumption that everything needs to be spelled out. If nothing else, it helps confirm my own thoughts. – Matthew Peters Feb 16 '15 at 16:52
Indeed, I'm not taking offense, sorry if it was read as such. This gives me insight towards the non-security community, though, so +1 for pointing out the confusion that some users think – Thomas Ward Feb 16 '15 at 16:53
@ThomasW. Something you may be interested in is [file system data forensics](https://www.google.com/search?q=data+forensics&ie=utf-8&oe=utf-8#q=file+system+forensic+analysis) The third link is a link to a pdf textbook that I used to help learn and it has worked really well! – Matthew Peters Feb 16 '15 at 16:57
Strange coincidence: I'm taking a Computer Forensics course right now. xD (Forensics tools are expensive >.<) – Thomas Ward Feb 16 '15 at 17:07
@ThomasW. Awesome! Ask tons of questions and learn a lot! As for the tools, yes they mostly are expensive but FTK Imager is used by a lot of local law enforcement agencies and is free! Also, go to a few conferences (if you get the chance) and talk to vendors, the love giving out 'evaluation' packages which are typically free trials that just last longer than your basic 30 day trial. Once you use them, lobby your school to get a copy for lab use and use it! If you are in America, community colleges are your best bet bro, good luck! – Matthew Peters Feb 16 '15 at 17:44
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/21163/discussion-between-thomas-w-and-matthew-peters). – Thomas Ward Feb 16 '15 at 17:46

score 2 · Answer 3 · answered Feb 16 '15 at 16:43

Multiple attacks with very evocative names have shown that compression can induce vulnerability by leaking infos about the plaintext:

CRIME (Compression Ratio Info-leak Made Easy)
BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext)

Basically, if you can control some part of the plaintext, you can guess other parts of the plaintext by looking at the size of the ciphertext (because compression will be more effective when you match existing patterns in the plaintext).

score -1 · Answer 4 · answered Feb 16 '15 at 16:30

-1

Imagine that someone is trying to brute force decrypt a file, and when a certain password is tried, an obvious English text file pops out.

Or suppose that the same exercise is tried, but all the bytes in the text file were shifted forward 79 places before encryption.

The second approach is stronger, even though obfuscation (shifting bytes) is NOT encryption.

The same may be true for compressing a file before encrypting it.

answered Feb 16 '15 at 16:30

refulgent144

137
3

2

This should not be the case with any modern cipher designed to resist against known-plaintext attack. – Dillinur Feb 16 '15 at 16:45
Moreover, your compression scheme will also add known patterns in the plaintext that might be as easy to look at as the pattern in the uncompressed data. – Dillinur Feb 17 '15 at 13:02

Which is more effective and 'secure': Compression+Encryption, or only Encryption?

4 Answers4

EDIT

Linked