8

It is common to generate random identifiers to expose through an API instead of using a simple auto incrementing primary key. The reasons are many:

  • Prevents easy enumeration.
  • Does not give away order objects were created.
  • Does not give away total number of objects or growth rates.

When generating these ID's, are there any substantial benefits to use a cryptographically secure pseudorandom number generator (as in crypto.randomBytes() in Node.js) over using something more simple (as in Math.random())?

SilverlightFox
  • 33,408
  • 6
  • 67
  • 178
Anders
  • 64,406
  • 24
  • 178
  • 215

6 Answers6

4

Math.random() is usually seeded from the current time of day. Therefore there is a chance that collisions will happen for objects that are generated around the same time each day.

From a security perspective, this means that these "random" values will be predictable by an attacker.

Even if seeded with something else, a non cryptographically secure pseudo random number generator can reveal its state should enough values be captured. This is why it is better from a security perspective to only use values generated by a cryptographically secure pseudo random number generator (e.g. crypto.randomBytes() that you refer to).

CSPRNGs have certain properties that make them suitable for use in security:

  • Every CSPRNG should satisfy the next-bit test. That is, given the first k bits of a random sequence, there is no polynomial-time algorithm that can predict the (k+1)th bit with probability of success better than 50%. Andrew Yao proved in 1982 that a generator passing the next-bit test will pass all other polynomial-time statistical tests for randomness.
  • Every CSPRNG should withstand "state compromise extensions". In the event that part or all of its state has been revealed (or guessed correctly), it should be impossible to reconstruct the stream of random numbers prior to the revelation. Additionally, if there is an entropy input while running, it should be infeasible to use knowledge of the input's state to predict future conditions of the CSPRNG state.

Using such a generator will satisfy your security requirements:

  • Prevents easy enumeration.
  • Does not give away order objects were created.
  • Does not give away total number of objects or growth rates.

However, you shouldn't let the fact that these are now unpredictable from protecting the first case with proper access controls (authentication and authorisation). URLs should not be considered secret and there are many ways they can leak (shoulder surfing with hidden camera, HTTP referrer header leakage, proxy and browser logs). Also, once a URL is known and you're relying on an unpredictable secret to protect it, the user accessing it will always have access and this cannot be revoked. Remember - they can easily bookmark it.

Using unpredictable values would definitely help your other requirements should they be deemed necessary for your application. I would recommend using a 128 bit random value which is more than enough entropy to prevent predictability and collisions.

SilverlightFox
  • 33,408
  • 6
  • 67
  • 178
3

Yes. A crytographically secure RNG is necessary to accomplish all the things you listed.

I don't know how Math.random works, but think of a TERRIBLE RNG we'll call 2bitRNG that's seeded with just 2 bits, but produces 128 bit numbers. I seed 2bitRNG with my 2 bits of entropy, and then produce 10,000 random numbers with it.

Since I've only seeded it with 2 bits, there's only 4 possible seeds. So an attacker only has to search through a space of 40,000 possible random numbers to find the correct seed, and position in the sequence you're currently at. That's a small number, and likely could be calculated within a fraction of a second with a modest computer. Once the attacker knows this, he knows the the next sequence, and how many RNGs you've produced.

I don't know how bad Math.random is or what other attacks it might be vulnerable to, but the general recommendation is to NOT use it for security purposes because it isn't secure. A proper secure RNG uses a large amount of entropy (~128 bits or higher) to prevent the kind of attack I just described and doesn't produce predictable results.

Steve Sether
  • 21,480
  • 8
  • 50
  • 76
2

tl;dr: That depends, if you can, use a CSPRNG.

While often UUIDs are used which are not (bound to be) from a CSPRNG, this might not be a good idea in cases the (not CS)-PRNG is not re-seeded regularly (as in long-living server processes)

If the algorithm you use to create such identifiers is known, and it's input can be estimated with a few samples (as it's the case with C's rand() for example), you loose all those properties you enumerated in your question.

While using a predictable PRNG to generate values from a single seed is thus bad, if the bad PRNG is reseeded with good entropy before every use (as it's the regular case in most web scenarios), that should be fine.

So, if you can, use a CSPRNG. If you can't, try to reseed it with good entropy before each generation step.

This all depends on how your generation process is seeded and used.

If you have a long-running process, have it use a CSPRNG. If you are using short-lived processes, use a good entropy source and basically any PRNG.

Flimzy
  • 655
  • 1
  • 6
  • 14
Tobi Nary
  • 14,302
  • 8
  • 43
  • 58
2

Step back and take a look at your problem from a different viewpoint. What is the worst thing that could happen if you used sequential numbers? You've mentioned enumeration: what is the threat from enumeration? Can someone cause a problem if they see you issue #00015 and #00016, and predict that ID #00017 will likely be generated next? Perhaps someone could hijack an email or twitter account that will likely be associated with an upcoming ID.

And what is the problem if someone can determine that ID #00023 is older than ID #42234? Do older IDs generally have more resources available than newer IDs? Perhaps I know that ID #00123 was issued on May 1st, and ID #00256 was issued on June 1st; if I learn that some popular celebrity announced they signed up in May, I have only 133 numbers to guess their ID. Does guessing someone's ID# cause anyone a problem, or is it going to be public information anyway?

At this point you should understand if sequential ID #s truly have to be kept secret, or if they can be public.

Then take a look at the threat vectors, (if these are realistic threats,) and come up with the risks. Most of these appear to be risks to your clients if their secret IDs become public, and less of a threat to your systems. But if you aren't protecting your clients, your reputation will certainly be destroyed, and your venture will fail. So if these are valid threats, you have to take them seriously.

Next, add in the understanding that a Pseudo-Random Number Generator (PRNG) was created for exactly one specific purpose: generating statistically well-distributed numbers for modeling problems in statistics. They were never created to generate "unguessable" numbers. Most PRNGs are only a math problem away from being a sequential number generator. You can try jumping through many seeding hoops to reduce the ability to generate guessable numbers, but that's exactly the problem a Cryptographically Secure PRNG (CSPRNG) is designed to solve. CSPRNGs were specifically designed to generate unguessable numbers.

Now that you better understand some of the risks, you should appreciate why you need (or don't need) to protect your clients from these threats; and with a better understanding of the differences between a PRNG and a CSPRNG, you can arrive at a solid decision.

If you have to keep these secret, then you must use a CSPRNG to protect your clients. If you don't need to keep them secret, then you should use a sequential number and reduce the complexity of your system by eliminating the random number generator altogether.

John Deters
  • 33,650
  • 3
  • 57
  • 110
  • using a random number isn't particularly complex, and is largely choosing one API over another. That doesn't reduce complexity in any meaningful way. Choosing sequential over random is more driven by use case of the number. Numbers that should be meaningful or easy to copy by human beings are good cases for sequential numbers. But I'd err on the side of CSRNGs over sequential unless sequential is called for rather than the other way around. – Steve Sether Apr 14 '16 at 14:44
  • Using math.Random() might lead someone to believe that the numbers are securely randomized when they're not. Using sequential numbers is unambiguously clear that you are OK with people analyzing the sequence. It's not just about calling an API, but the big picture. – John Deters Apr 14 '16 at 19:25
  • There's no cure for stupid. The same person who uses math.random and expects it to produce high entropy might also misunderstand when to not use sequences. The point being that for any advice you have to start with an underlying assumption that the problem and tools are understood. If they aren't, all bets are off. – Steve Sether Apr 14 '16 at 19:48
1

I would definitely recommend using a cryptographically secure random number generator wherever possible regardless of what the data is. The upsides outweigh the downsides.

Also, you never know what this code might be used for later. When I do code reviews, in most circumstances, I will always flag the use of a non-cryptographically secure random number generator.

You may also want to look at the use of per session indirect object references as in some cases these can make sense to use. Just don't forget to perform authorization/privilege checks before operating on or returning data from the client.

Casey
  • 895
  • 5
  • 18
  • Thanks for your answer! Could you develop a bit what the upsides are here? Perhaps I am being naive, but I fail to see what an attacker could accomplish if I use bad randomization. – Anders Apr 13 '16 at 13:03
  • In this particular case, maybe nothing. Although if I am able to generate enough of them and can run an entropy check and determine they are not random, I may be able to ascertain other existing identifiers. – Casey Apr 13 '16 at 13:05
  • This is one of those areas that is borderline security through obfuscation. Using indirect objects references (essentially what you are doing) can be very useful to mask the underlying data identifier. What works really well is to make these per user / per session. Requires more memory as you need to store these server side, but with proper privilege checks, can make it extremely difficult for an attacker to do anything with (other vulnerabilities not withstanding). – Casey Apr 13 '16 at 13:06
1

Non-cryptographically-secure PRNG's are vulnerable to analysis which can determine the parameters that generate the sequence. Once you know the sequence, it is possible to determine not only the next value, but also the total number of sequence numbers in use. See https://en.wikipedia.org/wiki/German_tank_problem for a fascinating example of doing this in the real world.

As well as making sequential identifiers unguessable, you also typically want to avoid generating the same identifier twice within the same context. If you use a PRNG the number of identifiers is limited by the period of the PNRG, and is vulnerable to any weaknesses in seed generation. If you use cryptographically secure random number generator, the number of identifiers is limited by the size of the identifier, and should be less vulnerable to seeding problems.

If the numbers are used in a closed system where at most a few thousand identifiers are generated, then a PRNG is probably OK.

If the identifiers need to be generated in a distributed manner and globally unique, then using a cryptographically secure random number generator helps to ensure there are no collisions.

Jonathan Giddy
  • 394
  • 1
  • 5