Is it safe for GPG to compress all messages prior to encryption by default?

Question

By default, GPG compresses text during encryption.

Additionally, RFC 4880 says:

2.3. Compression

OpenPGP implementations SHOULD compress the message after applying the signature but before encryption.

We know that encryption does not attempt to hide the data length, and that in certain circumstances, this property makes it possible for an attacker who can execute a limited chosen plaintext attack to learn more about the full plaintext (see CRIME, BREACH, etc.).

To quote Thomas Pornin:

The one-line executive summary is that thou shalt not compress.

Is it safe behavior for OpenPGP implementations to compress messages by default? Does it make sense to disable this behavior?

score 14 · Accepted Answer · edited Oct 07 '21 at 06:58

Encryption leaks data length: for any given input message, the encrypted output will have a length which will be close to that of the input message. If using a padding-less encryption mode (e.f. CFB, as used in the OpenPGP format that PGP implements), then the cleartext data length can be recovered exactly. This is a property common to all encryption systems, and it is the reason why traffic analysis is a powerful attack tool.

Compression does not qualitatively change this issue, but it can worsen it. Compression makes data length dependent on data contents, so it can leak extra information. For instance, consider a payment system for some Web site: the customer sends his credit card information to the site, which then forwards the credit card number, along with the transfer amount and the current date, to a partner bank. Suppose that each such message is to be encrypted with OpenPGP, and has a fixed-length format (16 characters for the credit card number, and so on). The fixed length is meant to deter traffic analysis. However, with compression active, this is not so: the message will be shorter if the amount or the current date contains digit patterns which are also found in the credit card number. The attacker knows the current date, and can infer the transfer amount (it must match the price of one of the items which are sold on the site), so the leak can be exploited to gain some partial information on the credit card number.

It is all a matter of context. The Web and HTTPS offer some characteristics which make exploitation quite effective:

The victim's browser includes the target secret (a cookie) in each request it sends, at a predictable place, and always the same value.
The attacker can inject hostile code (Javascript) which can arrange for triggering requests at will, hundreds of them, without anything showing up on the user's screen.
The hostile code from the attacker gets to add a lot of data of its own choosing along with the secret value. This is a chosen-plaintext attack.

Thanks to these characteristics, CRIME works (well, not anymore since browsers don't support TLS-level compression, but it used to work). But they don't apply to usual OpenPGP usage contexts. PGP was meant for emails, with a human painfully typing each message, and another human reading them at the other end. No hostile code, no or very little chosen plaintext, and the fuzziness of human language moving around secret values.

Hence, we can say that the normal usage scenario of PGP is such that compression does not substantially degrades security, which justifies its use by default. Although, to be fair, PGP does not compress by default because it is usually safe; it compresses by default because that's what it did twenty years ago, and old habits die hard. In the beginning, the compression was there to make up for the overhead of encryption, mostly the extra few hundred bytes for the asymmetric cryptography, and the +33% size increase implied by Base64 encoding. With recent hardware and networks, this is hardly relevant; PGP compresses out of Tradition more than anything else.

To conclude, an enlightening story. In early 1942, the US Navy was trying to prepare for the next Japanese attack after Pearl Harbor and the battle of the Coral Sea. The target was known under a code name in the Japanese communications. To confirm its identification, the Americans deliberately sent fake messages among the normal transit of routine messages, in order to see that information show up in the Japanese stream and correlate it. In this specific case, they claimed that their base at Midway had a broken distiller. The Japanese encoded transmissions reported that the target for the next attack was short on water. This crucial information was instrumental to the Japanese defeat, after which it all went downhill for Japan until the end of World War II. (Even Wikipedia has pointers on this anecdote, so it is not only a nice story, but it might even be true.)

This is a chosen-plaintext attack all right, coupled with traffic analysis. The important point, for the present discussion, is that routine messages are computer-like: though it was in 1942, military behaviour can be considered to emulate a mindless computer. The chosen-plaintext attack could be pulled off because it took place in a context where the victim (here, the Japanese communication network) could reliably be made to relay chosen information, in its own coding system, with a predictable format.

The same applies to PGP. If you use PGP in the traditional way, for handcrafted text messages, to communicate from a human brain to another, then compression is safe. If you use PGP in an automated way, to send lots of messages in a predictable format with contents that can mostly be guessed or even chosen by attackers, then compression may substantially increase your security issues.

I read Budiansky's "Battle of Wits" last week, which carries the Midway-water-distillation tale - with a spin. In short, the target of the water "leak" was to find out where the Japanese were targeting under code name "AF" - for the purposes of settling an internecine dispute between the Hawaii code-breakers and their DC counterparts. DC had been arguing that Hawaii was "AF". Hawaii quietly cabled Midway asking them to broadcast the water request in the clear... then let another station decrypt related Jap signals and report upon this confirmation that Midway was "AF". Excellent book, btw. — gowenfawr, Oct 07 '13 at 14:08

David · Answer 2 · 2013-10-06T06:30:46.350

If an attacker can trick you into repeatedly encrypting (and consequently compressing) nearly the same plaintext over and over again, then I suspect that such an attack might be possible, but getting into those circumstances seems unlikely. (You'd need plaintext that combines something secret to the attacker and something attacker-controlled as well.)

BEAST and CRIME both rely on properties of web browsers and TLS specifically. BEAST has nothing to do with compression, but with CBC mode encryption and guessing the first block of the plaintext, which is doable because HTTP requests are rather predictable, and because TLS 1.0 uses a predictable IV. (GPG, on the other hand, uses properly randomized IVs.) CRIME is based on using JavaScript to repeatedly make similar requests and see how the compression changes the output (as you've indicated) but requires a large number of requests and a passive attacker observing the compressed requests.

So, if you're not using GPG in an automated fashion, I don't see how attacks like CRIME could possibly be extended to GPG. Even if they could be, it's only a plaintext recovery attack, so you'd need a circumstance where an attacker can automatically inject their payload into otherwise secret plaintext and have you encrypt it.

Sorry, meant BREACH not BEAST. These acronyms are getting ridiculous. — Tom Marthenal, Oct 06 '13 at 06:43

score 1 · Answer 3 · answered Oct 06 '13 at 08:48

The compression ratio of messages can be inferred quite consistently for a given probable file format. Which allows an attacker to guess how large the plaintext is. This size information can be used to infer other things - for example, whether confidential medical test results were positive (lots of counseling, secondary graphs, and 'what next' pages included) or negative (smaller file).

Padding solves this problem, but as I far as I can tell GPG/PGP doesn't include an automatic pseudo-random byte padding option. So you need to choose between padding and no compression, or compression and no padding; as compression of zero-byte padding collapses the padding.

The probable security origin of the idea to compress PGP plaintext was to make known-plaintext attacks much harder to perform - although if your cipher algorithm can't directly protect against known-plaintext attacks then you have a problem that the '90s hack of compressing plaintext won't solve.

The non-security reason is simple enough - you can't compress ciphertext to reduce bandwidth and storage since it is too random.

So to answer 'Is it safe?': Yes, if you don't care about inferred file size or can pad the plaintext with a RNG beforehand.

score 1 · Answer 4 · edited Oct 07 '21 at 06:58

Besides giving away information about the message size, compressed format usually have fixed headers that could help an analyst. For example, the first three bytes of the Gzip file format described in RFC 1952 [1] are constant and the following seven (flags and timestamp) could be easily guessed in some case. As another example, the first 14 bytes of the bzip2 format described in [2] have very low entropy (2+6 bytes with constant value, 4 byte CRC and the remaining 2 bytes can assume at most 18 values).

[1] https://www.rfc-editor.org/rfc/rfc1952

[2] http://en.wikipedia.org/wiki/Bzip2#File_format

Is it safe for GPG to compress all messages prior to encryption by default?

4 Answers4

Linked