4

For an ecommerce website I need to create an identifier, that will represent the user's order. It can't be order's ID in database as it's assigned in predicateble manner. So it needs to be some pseudorandom string (ideally a-zA-Z0-9), that will offer increased security, because it won't be guessable.

The number can be tested using the web server itself on well known URL. But only with the speed of about 50requests/s. Also, from the pool of possible secrets only approx. 1M (which is an estimate of number or orders in the end of the website's lifetime) will be used.

What is a correct length of the identifier to provide enough security?

Note: Still the easiest way to get to the data is to brute-force admin's password, which is in many cases quite simple, but that's not a thing I can change :(

Tomáš Fejfar
  • 289
  • 2
  • 8

2 Answers2

6

I assume that you want identifiers which are:

  1. unpredictable: external entities ("attackers") must not be able to guess the next identifier, even given knowledge of all previous identifiers;
  2. unforgeable: it must not be possible, from the outside, to "get lucky" and produce a valid identifier;
  3. unique: each generated identifier should be distinct from all previously generated identifiers.

What makes a given identifier "valid" is, ultimately, that it is registered on a the server as such. In some contexts, you might want to be able to verify validity without having to talk to a database.

There are several methods:

  • Use randomness. Generate a sequence of bits from a cryptographically strong PRNG. If you get enough bits (i.e. 128 bits or more), then the identifier will fulfill the properties above with overwhelming probability. Most operating systems offer an appropriate PRNG (e.g. /dev/urandom on Unix-like systems).

  • Use a counter and a key. Let the generating server know a secret key K, and a counter C. The key is for a block cipher such as 3DES. When you want to produce the next identifier, increment C, then encode the new C value as a block of the size needed by the block cipher, and encrypt it. For instance, 3DES uses 64-bit blocks. Thus, take the counter value (an integer in the 1 to 1000000 range), encode it as a 64-bit integer (which will thus begin by a big bunch of zeros), and encrypt that, yielding a seemingly random 64-bit value.

The second method has the advantage of allowing verification by knowing the key K, without having to check in a database: decrypt the 64-bit identifier (with K), and see if this yields an integer in the expected range (1 to 1000000). The attacker, not knowing K, will have a hard time producing an "identifier" which, when decrypted, falls into the expected range. The method with a block cipher also allows for shorter identifiers, because it guarantees uniqueness through the fact that the block cipher is a permutation.

Let's put some numbers on that. You expect about one million identifiers, which is close to 220. Your identifiers have length n bits. An attacker choosing a random identifier has probability 2-n of "guessing" the next identifier, and probability 2-(n-20) of guessing one of the valid identifiers that the server will generate throughout its life. You have to choose n depending on your constraints (you want "security", but you probably also want to keep the identifiers small).

With the random selection of identifiers, you may encounter collisions if you generate identifiers which are too small. Very roughly, since you will produce about 220 identifiers, probability of encountering a collision will be about 2-(n-39), so you will want to use n ≥ 64. With n = 64, you have probability about 1 in 30 millions that you will ever encounter one collision throughout the server lifetime: that's probably low enough risk to be tolerated.

With the encryption method, you can tolerate a much shorter n: at least 20 bits (if you just want to make the next identifier unpredictable), more if you want to prevent the attacker from producing valid identifiers (not necessarily the next one, but one of the 1 million identifiers that your server will ever produce). This method, though, has some implementation issues:

  • It needs a secret key to be generated, stored, kept safe, but not lost if a hard disk fails. If the server uses multiple front-ends or back-ends, then the key will have to be shared between the relevant machines.
  • It uses a counter. There again, counter management can be difficult, or be a bottleneck, in multi-machine architectures.
  • There may not be an available standard robust block cipher with the block size that you wish to use. There are theoretical ways around that but nothing standard and well-vetted yet.
  • It uses cryptography, which is always a slippery path. Don't invent your own crypto. If possible, don't implement your own crypto either. These things are hard.

The easy way is to use randomness. Some systems offer UUID (known in the Microsoft world as GUID). The "version 4" UUID are generated with a PRNG, containing 122 bits worth of randomness, which would be sufficient for your needs; however, there is little guarantee that any given system which produces UUID values really uses a cryptographically strong PRNG. Indeed, there is a big difference between something which ensures uniqueness between collaborating users, and a system which is unpredictable.

So use a strong PRNG explicitly, and get the bits. "128 bits" is the default all-purpose size. As was explained above, chances are that you could use less bits, e.g. as low as 64, but that requires thinking and quantifying risks; it is much simpler to just produce 128 bits and be done with it.

If you need shorter identifiers for some reason (e.g. a human user will have to type them on a smart phone...), then you may envision a system with a block cipher, but this is fraught with perils. You will have to understand thoroughly what you are doing. Thus, this is possible, but most probably expensive (thinking time is what costs the most in software development).


In all of the above I talked of bits. That's because bits are what the machine is using internally. Ultimately, you want characters: this is obtained by simply encoding the bits with an ad hoc method, e.g. Base64. With Base64, each character is worth 6 bits, so 64 bits become 11 characters (actually 12, because Base64 mandates a final size which is a multiple of 4 characters, but the last one will always be an '=' sign). With 128 bits, you get 24 characters (the last two being '=='). Other encodings are possible, e.g. hexadecimal or Ascii85. That's really up to you. Base64 has the advantage of already been implemented in most programming framework.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • Thank you for the awesome answer. The idea of using block cipher is great! It's practically what we need - I wonder why I never thought of that! It's also great that the result holds the encoded value itself. – Tomáš Fejfar Apr 24 '13 at 15:36
  • The last paragraph: are you saying that once the identifier is encrypted with using a PRNG explicitly, that resulting value can be based64 encoded to produce a smaller identifier? – M.K. Oct 19 '18 at 13:44
1

50 req/sec ~= 1.5 * 10^9 requests yearly. (Let's call that N)

Say you have only one secret number, then if you wanted the expected cracking time to be 6 months, you'll need N different possible combinations (why? because testing for all of N takes 1yr and the chances of the password being on the first halve of tested passwords is 50%). For 1M passwords, you'll need:

10^6*N = 1.5 * 10^15

If you wanted 50 years instead of 6 months, then:

1.5 * 10^15 * 100 = 1.5 * 10^17

If my maths are right, that's roughly 2^57, so you need 57 bits of entropy. If you use the 62 alphanumerical characters and truly random secret numbers, you'll need 57 / log_2(62) = 9.5 characters which we can round up to 10 alphanumerical characters.

Of course, I answered not taking into consideration any aspect of your website. For instance: are you certain that only 50 req/sec will ever be available? Can parallel users make more than that requests? Would it be a good idea to limit the number of requests by IP, slowing them down if they go over a threshold? shouldn't that content be password-protected anyway?

serans
  • 191
  • 1
  • 4
  • Thanks for the answer. The value is used to link directly to the order detail. User does not have to be registered or have a password to make an order. We'll surely slow the requests for this particular page. It does not have to be airtight, it just should be absolutely insolvable for a script kiddie... the contents are ordered items and delivery address a some info about whether it's paid or not (with option to pay). It does not grant access anywhere, it only displays sensitive information (but the sort that is fairly easily found elsewhere). – Tomáš Fejfar Apr 24 '13 at 15:41