Why is it necessary to minimize redundancy in the ciphertext of a stream cipher?

Question

I am utterly confused about this. I understand why you would want to minimize redundancy if you're using a substitution cipher, but why is this necessary when using a stream cipher such as RC4? Since the attacker does not have the key, how would they draw any conclusions about the contents of the plaintext based on the relative frequencies of bytes in the ciphertext? For example, RC4 was found to have a weakness in that zero bytes are more frequent. But a zero byte can correspond to potentially any byte in the plaintext, since you're XORing each byte of the plaintext with a different value. So how would this be exploited? I hope this question makes sense.

score 3 · Answer 1 · answered Jul 05 '17 at 10:53

The main problem with the bug you are describing is that 0 appears more frequently in certain bits of the output stream. The most common attack is that an attacker can make a victim create millions of connections to a server, while observing the network traffic.

To give an example, let's change 'bit' to 'byte', increase the chances and see what happens.

Let's say:

I have a secret cookie with the value 'a';
I have some kick-ass stream cipher that output random bytes;
an attacker can make me perform millions of connections to a server;
on every request, the first character for my plaintext is my cookie ('a') followed by some other plaintext;
encryption is an xor of plaintext with the bytes generated from the stream (like with RC4);
that attacker can monitor my network traffic.

With every new connection I make to the server, a new random stream is generated. This means my cookie ('a') will be xor'ed with a random byte every time. Because this happens uniformly random, our attacker can't deduce information about the cookie, as the chance of 'a' xor any character is 1/256.

Let's now introduce a flaw to our stream cipher. For the first byte, it has a tendency to output 'b' with a 10% chance, and it has a 90% chance to output any other character. Now, if the attacker has me create 10 connections to the server, there is a 1/10 chance that 'b' xor 'a' appears as the first byte of ciphertext, instead of the previous 1/256. So after 100 connections, 'b' xor 'a' should have appeared about 10 times, etc etc.

Keep in mind that there are also other types of attacks on RC4. The summary of attacks on Wikipedia is actually very nice:

https://en.wikipedia.org/wiki/RC4#Security

score 1 · Answer 2 · answered Mar 06 '17 at 18:13

1

The primary reason to remove redundancy from the cleartext stream is that cleartext can be compressed, but cyphertext cannot. If you're looking to reduce the size of the data, that's your only option.

A security reason to remove redundancy would be if you're using a block cypher in Electronic Code Book mode. Redundant data is visible as duplicate cyphertext blocks. However, the better approach there is to not use ECB, as it is inherently vulnerable.

Cryptographically, redundancy between messages has led to "cribs". If the cryptanalyst sees messages that he suspects all begin with "Good Morning, Mr. Phelps", he can use that to help decode the messages. This is how Turing's Bombes were used to break German Enigma traffic in WWII.

However, removing redundancy within a single message (by compressing the cleartext) has led to some novel attacks. The CRIME security exploit used differences in compression sizes to recover bytes of plaintext.

answered Mar 06 '17 at 18:13

John Deters

33,650
3
57
110

Lots of stuff I've never heard of. I have a lot of reading to do. – Legend of Overfiend Mar 06 '17 at 18:33
Doesn't ECB only apply to block ciphers and therefore has little to no relevance in the context of the question? Also, while you're correct to point out that repetition (though mostly repetition of a three-letter key) in part enabled enigma cracking , again isn't it kind of a large leap from the enigma to a modern stream cipher? – Out of Band Mar 06 '17 at 21:09
Yes, ECB is a block cipher mode, so it's not important unless he's using a block cipher in a streaming mode. Enigma is a stream cipher, and yes, modern stream ciphers don't share the same weakness. However, history plays a large part in why people say "do this, don't do that!", especially in cryptography. – John Deters Mar 06 '17 at 21:13
As a historian by profession, I guess I'll have to agree to that :-) – Out of Band Mar 06 '17 at 21:20

Out of Band · Answer 3 · 2017-03-06T22:20:40.637

I think you need to differentiate between one time pads, where redundancy really doesn't matter because it can never be detected, and conventional stream ciphers where redundancy in the plaintext can lead to detectable patterns in the ciphertext if the stream cipher has weaknesses.

A very simple example would be a stream cipher that has a very short period (shorter than the plaintext to encrypt). Then redundancies in the plaintext would show up in the ciphertext.

Redundancies in the ciphertext often tell you something about the plaintext, or at the very least something about it's structure and what part of it you're looking at.

Why is it necessary to minimize redundancy in the ciphertext of a stream cipher?

3 Answers3