36

I was recently listening to the security now podcast, and they mentioned in passing that the linear congrunential generator (LCG) is trivial to crack. I use the LCG in a first year stats computing class and thought that cracking it would make a nice "extra" problem.

Are there any nice ways of cracking the LCG that doesn't involve brute force?


I'm not sure if this question is OT, but I wasn't sure where else to post the question. Also, my tags aren't very helpful since I don't have enough rep to create new tags.

D.W.
  • 98,420
  • 30
  • 267
  • 572
csgillespie
  • 957
  • 1
  • 9
  • 15
  • 1
    @jeff I'm sure there are experts on this one here also, and this site focuses on the right mindset for the question. Note also that wikipedia notes "If a linear congruential generator is seeded with a character and then iterated once, the result is a simple classical cipher called an affine cipher; this cipher is easily broken by standard frequency analysis." – nealmcb Jun 02 '11 at 15:14

3 Answers3

36

Yes. There are extremely efficient ways to break a linear congruential generator.

A linear congruential generator is defined by sn+1 = a sn + b mod m, where m is the modulus. In its simplest form, the generator just outputs sn as the nth pseudorandom number. If m is known to the attacker and a, b are not known, then Thomas described how to break it.

If none of a, b, m are known, one can still break a linear congruential generator, by first recovering m. It is an interesting exercise to derive how to do so efficiently; it can be done. I'll show how below; don't read on if you prefer to try to figure it out for yourself.

To recover m, define tn = sn+1 - sn and un = |tn+2 tn - t2n+1|; then with high probability you will have m = gcd(u1, u2, ..., u10). 10 here is arbitrary; if you make it k, then the probability that this fails is exponentially small in k. I can share a pointer to why this works, if anyone is interested.

The important lesson is that the linear congruential generator is irredeemably insecure and completely unsuitable for cryptographic use.


Added: @AviD will hate me even more :), but here's the math for why this works, for those who requested it.

The key idea: tn+1 = sn+1 - sn = (a sn - b) - (a sn-1 - b) = a sn - a sn-1 = a tn mod m, and tn+2 = a2 tn mod m, and tn+3 = a3 tn mod m. Therefore tn+2 tn - tn+12 = 0 mod m, i.e., |tn+2 tn - tn+12| is a random multiple of m. Nifty number theory fact: the gcd of two random multiples of m will be m with probability 6/π2 = 0.61; and if you take the gcd of k of them, this probability gets very close to 1 (exponentially fast in k). Is that cool, or what?

D.W.
  • 98,420
  • 30
  • 267
  • 572
  • 6
    Also, a linear congruential generator is often bad at producing statistically good randomness. If _m_ is even, _a_ and _b_ are odd, then the least significant bits of the successive values of _s_ follow the very non-random pattern 010101010101... – Thomas Pornin Jun 03 '11 at 15:35
  • 3
    aaagghh, too much math! +1, anyway... ;) – AviD Jun 06 '11 at 22:44
  • 2
    @csgillespie, OK, I added the math. @AviD, avert your eyes! :) – D.W. Jun 07 '11 at 19:19
17

A linear congruential generator is linear, which should say it all.

Namely, you have a state s, which is updated with: s ← as + b mod w, for two constants a and b, and a convenient modulus w (typically 232: s, a and b are 32-bit words); the output consists in the successive values of s. If you have three successive outputs (s0, s1 and s2), then you get two linear equations in two unknowns (a and b), which are easily solved with elementary arithmetics.

This can be extended to variants where you only get parts of the values s, possibly not consecutive. If a and b are two 32-bit constants, you only need about 96 bits worth of output to recompute the constants.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
3

Another method of solving for m comes from this paper.

Essentially, this method exploits the fact that the linear congruential generator dramatically fails the planes test. The determinant of a 3x3 matrix using 4 outputs is a multiple of m.

The gcd of two multiples (n_1 and n_2 of m is m if x_1 = n_1/m and x_2 = n_2/m are co-prime.

The probability that k integers are co-prime is given by 1/ζ(k) so the probability that x_1 and x_2 are co-prime is 1/ζ(2) or 6/π^2, about 61%.

Like @Thomas said, once m is found the remainder of the problem is easy.

Jon Takagi
  • 131
  • 3