27

I was reading PGP docs and came upon a part written by Phil Zimmermann (PGP's creator) that piqued my curiosity:

When I was in college in the early 70s, I devised what I believed was a brilliant encryption scheme. A simple pseudorandom number stream was added to the plaintext stream to create ciphertext. This would seemingly thwart any frequency analysis of the ciphertext, and would be uncrackable even to the most resourceful government intelligence agencies. I felt so smug about my achievement.

Years later, I discovered this same scheme in several introductory cryptography texts and tutorial papers. How nice. Other cryptographers had thought of the same scheme. Unfortunately, the scheme was presented as a simple homework assignment on how to use elementary cryptanalytic techniques to trivially crack it. So much for my brilliant scheme. From this humbling experience I learned how easy it is to fall into a false sense of security when devising an encryption algorithm.

What techniques would be able to trivially decrypt text encoded in this way? It seems nearly equivalent to a one-time pad (which is unbreakable without the pad), provided that the pseudo-RNG is complicated enough (period much longer than encrypted text; mean size added to each character significantly larger than size of chars) and a suitably complicated seed (so you can't brute force every seed).

E.g., using a Mersenne-Twister (with a period of 2^19937 -1 ~ 4.3x10^6001 ) and a passphrase that generates a random 256 bit seed; it seems uncrackable without being given the seed.

Or did they generate simple random number generator with a period of 2^32 - 1 ~ 4.3 billion (it was the 70s; the Mersenne Twister wasn't even invented until the mid-1990s); where you could brute force try each of the 4.3 billion random seeds with a quick check of the cipher text to see if dictionary words appear or simple frequency analysis (lots of spaces and e)?

unor
  • 1,769
  • 1
  • 19
  • 38
dr jimbob
  • 38,768
  • 8
  • 92
  • 161
  • 1
    The period length of a PRNG is not an indication about its security - if already some small part of the sequence is enough to recreate the state of the generator. (This is the case with most simple LFSRs, for example.) – Paŭlo Ebermann Aug 31 '11 at 18:39
  • 1
    @Paŭlo Ebermann - I agree large periodicity does not indicate cryptographic strength; but small periodicity seems to guarantee weakness. If only 2^32 unique seeds exist; you could brute force by trying all seeds on your stream. Or if you know the periodicity is 2^32-1 and you have ~1 TB of encoded data, you could recover the original by traditional crypto-attacks (e.g., frequency analysis on each group of ~256 characters spaced 2^32-1 bytes apart). – dr jimbob Aug 31 '11 at 19:51
  • 1
    Yes, you are right, my wording was not optimal. I should have said "large period length does not imply that a PRNG is cryptographically secure". Of course, small period lengths are even worse. The point is that the output should not give enough information about the state. – Paŭlo Ebermann Aug 31 '11 at 19:58
  • 1
    @Paulo - But I do see your point; with MT19937 it seems you just need to see 624 random numbers to find the current seed and predict future values. (At least according to wikipedia's MT page & discussion. It makes sense from `ceil(19937/32)=624`; and it seems that the MT algorithm is bijective.) You would still need to guess/deduce 624 consecutive bytes of the original message to recover the random numbers from the seem--but this seems much more feasible to attack now. – dr jimbob Aug 31 '11 at 20:28
  • 1
    Just having to have seen 624 of the random words (19937 bits) would be bad enough. I suspect that if you know that the MT19937 state had been initialised from a 256-bit seed just prior to use then it could reduce the difficulty even further. – Misha Aug 31 '11 at 22:22

4 Answers4

12

A PRNG being "good" (having strong statistical randomness guarantees, say, plus having a long period) says nothing about its security. See e.g. discussion in this thread.

The thread discusses the difference between:

  • one time pads (unbreakable in principle as long as they are neither leaked nor re-used, but usually impractical)
  • stream ciphers (which can be made as secure as necessary, and can be quite practical)
  • PRNGs (that weren't designed to be cryptographically secure) used as stream ciphers (typically easily broken)

What Phil should have used was a stream cipher not just any old PRNG. MT (and earlier PRNGs) are not suitable for use as a stream cipher. Salsa20/ChaCha (by Dan Bernstein) and ISAAC are two specific stream ciphers. ISAAC is used by shred. Salsa20 is part of the EU eSTREAM/ECRYPT programme. Of course, Phil can be forgiven for not using a stream cipher: RC4 (which is considered broken -- its weakness are part of what makes WEP insecure -- but which is the basis for ISAAC) was only invented in 1987.

The cryptographic weaknesses of normal PRNGs (including MT and Wichmann-Hill) has lead to vulnerabilities in e.g. TCP sequence number attacks. Those vulnerabilities are sometimes addressed using a different sort of CSPRNG, which gathers entropy "as it goes" (e.g. from mouse/timing jitter). To be suitable for use as a stream cipher, a CSPRNGs must have all the input entropy available at the start, rather than gathering it as it goes. See the wikipedia pages on CSPRNGs and on /dev/[u]random.

Misha
  • 2,699
  • 2
  • 19
  • 17
  • 2
    *"Salsa20/ChaCha (by Dan Bernstein) and ISAAC are the strongest unbroken stream ciphers"* - I think there is no rational basis for this statement. I recommend you remove the unfounded claim about the "strongest" stream ciphers. The rest of the answer is excellent. – D.W. Sep 01 '11 at 19:56
  • Done. The statement was a too-aggressive interpretation of min("computational complexity of the best known attack", "effective key length") from the columns in the table on the wikipedia page (and removing VEST based on eSTREAM phase 2 finding a practical attack). But you are right, there are too many explicit and implicit question marks over those figures for them to be reliable, and one of the eSTREAM ciphers (MICKEY) isn't even listed. I kept in the references to the specific ciphers to keep the concreteness of the answer, but e.g. HC256, Pike, SEAL and SNOW2 are also contenders. – Misha Sep 01 '11 at 21:14
  • I think it's the short IVs that make WEP insecure, not just its use of RC4. – forest May 21 '18 at 01:10
4

I have no idea what the method Phil Zimmerman originally used for his encryption, so I can't really say anything about that.

However, Mersenne-Twister can be made in to a "secure" stream cipher, for example CryptMT. CryptMT was was subsequently broken, though: Distinguishing Attack on CryptMT. Reading that paper probably gives a pretty good idea on how to attack Mersenne-Twister and its ilk.


Actually, I did some more investigation. First of all, the paper I quoted has been subsequently redacted by the authors, see discussion here, and it was against CryptMTv1, not CryptMTv3 that is the current version. There are no known attacks against CryptMTv3. The closest to an attack I've found is On the Security of Stream Cipher CryptMT v3, which explicitly says:

However, we have not found any non-randomness about the keystream output.

Also, the eSTREAM final report for CryptMT says:

CryptMT v3. The cipher CryptMT has a very unusual design which delivers very reasonable performance. While there have been no negative cryptanalytic results against the cipher in the last phase of eSTREAM, we are somewhat concerned that the security of the cipher, in particular the non-linear filter component, might not yet be as well-understood as some of the other finalists. We anticipate that elements of CryptMT will continue to be of interest to the cryptographic community, and we hope that the full advantages of the approach embodied in CryptMT v3 can be evaluated. However, we are currently not sufficiently confident in the design and security of this algorithm for us to include it in the final portfolio.

Hardly a negative merit!

Also, looking at the "very reasonable performance" mentioned above at eBASH, it seems that CryptMTv3 offers amazing performance for long messages (for example, 1.82 cycles per byte for long messages), often only bested by Salsa20/8, where as Salsa20/8 has already been broken (barely, and Salsa20/12 is still very secure).

So I would say CryptMT is definitely a contender in stream ciphers even if it hasn't been analyzed enough yet!

Nakedible
  • 4,501
  • 4
  • 25
  • 22
  • It doesn't seem that CryptMT was broken by the paper you linked to -- merely you can do a distinguishing attack to determine that a CryptMT stream is not random bits, but likely CryptMT encrypted data. (And to do that "distinguishing attack" you still need 2^50 bits ~ 128 TB of encrypted data). – dr jimbob Aug 31 '11 at 18:45
  • You are right, there is no attack except a distinguisher. However, getting a distinguisher with just 2^50 bits for a cipher that is possibly meant to have 2^256 bits of security is so obviously weak that I don't think anyone will bother to cryptanalyze CryptMT long enough to "break" it in the cryptanalytic sense - eg. to get an attack that succeeds in less than 2^256 operations. Like in the case of AES, 2^254.4 is enough to qualify as broken. This is obviously still a long way from being concretely insecure - but like Bruce Schneier says, attacks only get better. – Nakedible Aug 31 '11 at 19:37
  • @Dr Jimbob, A distinguishing attack *does* mean that the cipher is broken. – D.W. Sep 01 '11 at 19:57
  • *"Mersenne-Twister can be made in to a "secure" stream cipher"* - Well, with a lead brick and $100 million you can make a lead brick into an supersonic airplane, but don't forget the $100 million. I don't think Mersenne Twister is a terribly useful basis for building a stream cipher. – D.W. Sep 01 '11 at 20:00
  • 1
    I expanded my answer quite a bit. I, too, would've assumed that M-T is not a terribly useful basis for building a stream cipher - but I've changed my mind after reading more! – Nakedible Sep 01 '11 at 21:03
  • @D.W. - You find a disk array with a 128 TB of data on it (that happens to have been encrypted with CryptMT). You run your distinguishing attack on the the 2^50 bits and find out it was probably encoded CryptMT data. You have no idea what data was encoded or anything other than that now you can say with good confidence that it was encoded by CryptMT and aren't any closer to being able to decrypt it. By that logic, ascii-armored PGP is broken since the distinguishing attack of reading the armor headers tells you it was a PGP encoded message. – dr jimbob Sep 01 '11 at 21:05
  • 3
    @Dr Jimbob, You are missing the point. "Broken" has a very specific cryptographic meaning. Claiming that CryptMT is insecure because of a distinguisher is a matter of opinion, but claiming that it is not broken, as cryptographers define broken, is not. – Nakedible Sep 01 '11 at 21:11
  • @drjimbob If you can perform a distinguishing attack, then the output is non-random, which means the algorithm will fail the [next-bit test](http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator#Requirements), and is therefore broken. – Jon Bentley Feb 21 '14 at 19:52
  • @JonBentley - Ugh. Dredging this up again. Yes, if the paper wasn't flawed, CryptMT would be broken against [indistinguishability from random noise](http://en.wikipedia.org/wiki/Ciphertext_indistinguishability#Indistinguishable_from_random_noise), despite encrypted data being semantically secure and not being broken against either chosen plaintext or (adaptive) chosen ciphertext attacks. However, I still feel its important to stress *what* attack a cipher is broken against versus merely labeling broken/not broken (the broken property may be irrelevant for your purpose). – dr jimbob Feb 21 '14 at 22:10
  • @JonBentley - There's also a difference between broken-in-practice (WEP, WPS, LM-hash, DES, KASUMI, un-keystrengthened MD5/SHA-1 hashes of passwords) and broken-in-theory-but-secure-in-practice. The latter includes things that are still widely used like AES (e.g., best attack on full-strength AES-128 not in a related-key is 2^126.1). I was mistaken categorizing CryptMT as "not broken" 3 years ago--it is broken against indistinguishability from random while maintaining semantic security. Granted, RC4 had biased output that first led to a distinguishing attack and then the bias broke the key. – dr jimbob Feb 21 '14 at 22:14
0

Any simple XOR/PRNG cipher where the key is used to initialize the PRNG is vulnerable to a known-plaintext attack with complexity 1:

  1. Encrypt a plaintext at least as long as the unknown ciphertext.
  2. XOR the known plaintext with its ciphertext. This gives you the keystream.
  3. XOR the keystream with the unknown ciphertext: this gives you the original plaintext.

If you can encrypt a plaintext consisting of all "0"s, you can skip step 2: the output of the encryption process is the keystream.

It doesn't matter what PRNG you're using: everything from RANDU to Fortuna is vulnerable to this sort of keystream-recovery attack.

Note that there are ciphers that use "XOR the plaintext with the keystream" as the core of their operation. However, they take precautions to prevent this attack: one-time pads never re-use the password, stream ciphers either require single-use keys or single-use values called nonces, and output feedback and counter block-cipher modes of operation use both the key and a second single-use value called an initialization vector to ensure the RNG is never seeded the same way twice.

Mark
  • 34,390
  • 9
  • 85
  • 134
0

I think the best way to approach this question as a cryptographic layperson is to step away from the modern cryptography and consider instead classical, paper-and-pencil ciphers. In this case, probably the best example to consider is the running key cipher:

In classical cryptography, the running key cipher is a type of polyalphabetic substitution cipher in which a text, typically from a book, is used to provide a very long keystream.

Basically, such a cipher takes a plaintext and an equally long natural-language text (e.g., a passage from a novel), and encrypts each letter of the plaintext by shifting it by an amount that depends on its corresponding keystream letter.

Such ciphers are easily breakable:

[I]f (as usual) the running key is a block of text in a natural language, security actually becomes fairly poor, since that text will have non-random characteristics which can be used to aid cryptanalysis. As a result, the entropy per character of both plaintext and running key is low, and the combining operation is easily inverted.

To attack the cipher, a cryptanalyst runs guessed probable plaintexts along the ciphertext, subtracting them out from each possible position. When the result is a chunk of something intelligible, there is a high probability that the guessed plain text is correct for that position (as either actual plaintext, or part of the running key). The 'chunk of something intelligible' can then often be extended at either end, thus providing even more probable plaintext - which can in turn be extended, and so on. Eventually it is likely that the source of the running key will be identified, and the jig is up.

The reason such attacks work is because:

  1. The attacker normally has some partial information about the plaintext. In classical cryptography this would be the likely plaintext languages and topics (which imply letter and word frequencies and dependencies between consecutive letters and words). In modern cryptography, this would be things like the communications protocol that's being encrypted (e.g., HTTP over SSL), the endpoints of the connection (e.g., Google), etc.
  2. A natural language keystream has patterns such that, given a few letters from the keystream, you to guess the next few letters with very high probability. Now the key thing to observe is that non-cryptographic PRNGs have this property as well, and therefore they admit of similar attacks.

This blog series describes how to break simple linear congruential RNGs and the Mersenne Twister. For the java.util.Random RNG, if you have two consecutive ints output from the PRNG, that's enough to reconstruct its state and predict the rest of the pseudorandom stream (both forwards and backwards). Therefore, if you used that PRNG to generate your keystream, an attacker who knows or guesses any eight consecutive bytes of the ciphertext can then apply a variant of the running cipher attack and the blog series' PRNG-cracking techniques to decipher the rest of the message.

Part 3 of the blog series describes how you can similarly predict the output a Mersenne Twister if you can get 624 consecutive output words. So it's harder than the linear congruential generator, but still vulnerable.

Luis Casillas
  • 10,181
  • 2
  • 27
  • 42