What is ("ASCII" redundancy) which is said to be required to retrieve the original plain text in 2 times pad?

Question

Sorry for the long title :$

In a video speaking about attacks on the one time pad which uses psudorandom generators (PRG), It's said that Eve can retrieve the plain text of 2 messages that have been encrypted using the same key by XORing the two resulting ciphers .

The lecturer said : "since ASCII has enough redundancy it is easy to retrieve m1 and m2 from the string (m1 XOR m2) " .

What's the meaning of enough redundancy ?

This is from week 1 of the "Cryptography 1" course. https://www.coursera.org/learn/crypto/ — John, Jan 08 '17 at 19:38

score 4 · Answer 1 · answered Oct 02 '13 at 11:15

It is not really the redundancy of ASCII, but the redundancy of that which is encoded with ASCII. Namely, you use ASCII because you encode sequences of characters, and you use characters because you want the data to be perceptible by human eyes and brains. So the data is text or something close enough (e.g. XML). Some characters will be more common than others. Some patterns will be used.

The whole idea of a reused one-time pad is that it transports information from one plaintext to the other: if you know 'A xor P' and 'B xor P' (A and B being the plaintext bytes, P the pad), then you can compute 'A xor B' by xoring the two values together. Then every bit of information you learn about A yields the corresponding information on B. For instance, if A is some XML text and you have already ascertained that some plaintext bytes are "<config" then you can guess that the next byte of A will probably be a ">", a space, or a "u". These are three possible plaintext bytes, from which you can immediately deduce the only three possible plaintext bytes at the same position for B. Depending on what you know of B, you can prune out values which "make no sense", and thus know the byte value for A, and so on.

When breaking the infamous two-times pad, you just keep on propagating information from one plaintext to the other, and back.

ASCII already says quite a lot, because valid ASCII values are in the 0 to 127 range, but in "normal" text, a lot won't appear: in the 0 to 31 range, only 9 (tab), 10 (lf) and 13 (cr) will be encountered in practice.

score 3 · Answer 2 · answered Oct 02 '13 at 08:07

Consider a one digit OTP of '9'; a Message A of '2'; and a Message B of '4'.

OTP+A=1
OTP+B=3

If you know both messages then OTP = (OTP+B) - A = 9; or OTP = (OTP+A) - B = 9. Since OTP encryption tends to be simple bitwise XOR, if you know the Nth bit of both A & B then you know the Nth bit of the OTP.

I think your lecturer is referring to the fact that most ASCII characters will never manifest in a normal plaintext message, so you can infer a lot of the Nth bits right off the bat. This makes guessing the plaintext of given byte in a given message a lot easier as the range of possible value is a fraction of what it should be. And if you know one side, you know the other.

A better or more concise answer would require a link to the actual lecturer video or slide so I could confirm what the lecturer means by redundancy in this context. — LateralFractal, Oct 02 '13 at 08:07

What is ("ASCII" redundancy) which is said to be required to retrieve the original plain text in 2 times pad?

2 Answers2