0

I have a large string that I need to shuffle in a secure manner. It will be around 4000 characters. The resulting permutation of such string, must be one in the set of all possible permutations (4000!). The resulting permutations must of course also be evenly distributed between all possible states. I need to reasonably prove that my implementation meets those requirements, but formal methods are not needed.

I was planning on using a simple Knuth-Fisher-Yates for this, but I worry about my RNG. I am bound to implement this in JavaScript, as my application must run in Thunderbird. From (what I can read)[https://developer.mozilla.org/en-US/docs/Web/API/RandomSource/getRandomValues], Mozilla can provide a cryptographically secure random value, and I plan on combining it with (this)[https://github.com/davidbau/seedrandom] to get random numbers in the desired range.

Now, I believe I could have issues with the internal state space of the PRNG. If it has to few, it will "wrap around" and produce a bias for certain numbers. If it has too many, it might apply modulo to scale down, thus also producing a bias for numbers in the lower range.

How long a seed would I need to initialize a PRNG used for KFY-Shuffle a string of 4000 characters? Am I overthinking this, is there something I did not think of?

Andreas
  • 101
  • 2
  • 1
    You aren't [rolling your own crypto](https://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own), are you? – Philipp Oct 05 '16 at 19:32

1 Answers1

2

Now, I believe I could have issues with the internal state space of the PRNG. If it has to few, it will "wrap around" and produce a bias for certain numbers. If it has too many, it might apply modulo to scale down, thus also producing a bias for numbers in the lower range.

No, the state space of a good PRNG will not make it appear biased in the ways that you fear here. A good, non-crypto RNG will produce output that appears unbiased and independent to some set of statistical tests. A cryptographically secure one, in addition, will produce output that appears unbiased and independent to any efficient observer; i.e., you can't write an efficient program to tell its output apart from a truly random sequence.

How long a seed would I need to initialize a PRNG used for KFY-Shuffle a string of 4000 characters? Am I overthinking this, is there something I did not think of?

The number of distinct permutations of a 4,000 element sequence is 4,000! (factorial), which is something like 2^43000, so technically you need a 43,000 bit string (5,263 bytes) to be able to uniquely address any one such permutation.

Practical answer: just use a crypto RNG. While it's unlikely at best that it would be able to output all possible permutations even when fed external entropy, the thing that matters is that no real-life attacker will be able to tell the difference.

Luis Casillas
  • 10,181
  • 2
  • 27
  • 42
  • Thank you very much for your answer, very helpful. The seedrandom.js library I linked to, does not seem to come with a cryptographically secure promise. It uses "RandomSource.getRandomValues", but that alone is no proof. Is there any known good CSPRNG available in JavaScript, one where I can also read through the proof? – Andreas Oct 05 '16 at 18:56