I assume that you want identifiers which are:
- unpredictable: external entities ("attackers") must not be able to guess the next identifier, even given knowledge of all previous identifiers;
- unforgeable: it must not be possible, from the outside, to "get lucky" and produce a valid identifier;
- unique: each generated identifier should be distinct from all previously generated identifiers.
What makes a given identifier "valid" is, ultimately, that it is registered on a the server as such. In some contexts, you might want to be able to verify validity without having to talk to a database.
There are several methods:
Use randomness. Generate a sequence of bits from a cryptographically strong PRNG. If you get enough bits (i.e. 128 bits or more), then the identifier will fulfill the properties above with overwhelming probability. Most operating systems offer an appropriate PRNG (e.g. /dev/urandom
on Unix-like systems).
Use a counter and a key. Let the generating server know a secret key K, and a counter C. The key is for a block cipher such as 3DES. When you want to produce the next identifier, increment C, then encode the new C value as a block of the size needed by the block cipher, and encrypt it. For instance, 3DES uses 64-bit blocks. Thus, take the counter value (an integer in the 1 to 1000000 range), encode it as a 64-bit integer (which will thus begin by a big bunch of zeros), and encrypt that, yielding a seemingly random 64-bit value.
The second method has the advantage of allowing verification by knowing the key K, without having to check in a database: decrypt the 64-bit identifier (with K), and see if this yields an integer in the expected range (1 to 1000000). The attacker, not knowing K, will have a hard time producing an "identifier" which, when decrypted, falls into the expected range. The method with a block cipher also allows for shorter identifiers, because it guarantees uniqueness through the fact that the block cipher is a permutation.
Let's put some numbers on that. You expect about one million identifiers, which is close to 220. Your identifiers have length n bits. An attacker choosing a random identifier has probability 2-n of "guessing" the next identifier, and probability 2-(n-20) of guessing one of the valid identifiers that the server will generate throughout its life. You have to choose n depending on your constraints (you want "security", but you probably also want to keep the identifiers small).
With the random selection of identifiers, you may encounter collisions if you generate identifiers which are too small. Very roughly, since you will produce about 220 identifiers, probability of encountering a collision will be about 2-(n-39), so you will want to use n ≥ 64. With n = 64, you have probability about 1 in 30 millions that you will ever encounter one collision throughout the server lifetime: that's probably low enough risk to be tolerated.
With the encryption method, you can tolerate a much shorter n: at least 20 bits (if you just want to make the next identifier unpredictable), more if you want to prevent the attacker from producing valid identifiers (not necessarily the next one, but one of the 1 million identifiers that your server will ever produce). This method, though, has some implementation issues:
- It needs a secret key to be generated, stored, kept safe, but not lost if a hard disk fails. If the server uses multiple front-ends or back-ends, then the key will have to be shared between the relevant machines.
- It uses a counter. There again, counter management can be difficult, or be a bottleneck, in multi-machine architectures.
- There may not be an available standard robust block cipher with the block size that you wish to use. There are theoretical ways around that but nothing standard and well-vetted yet.
- It uses cryptography, which is always a slippery path. Don't invent your own crypto. If possible, don't implement your own crypto either. These things are hard.
The easy way is to use randomness. Some systems offer UUID (known in the Microsoft world as GUID). The "version 4" UUID are generated with a PRNG, containing 122 bits worth of randomness, which would be sufficient for your needs; however, there is little guarantee that any given system which produces UUID values really uses a cryptographically strong PRNG. Indeed, there is a big difference between something which ensures uniqueness between collaborating users, and a system which is unpredictable.
So use a strong PRNG explicitly, and get the bits. "128 bits" is the default all-purpose size. As was explained above, chances are that you could use less bits, e.g. as low as 64, but that requires thinking and quantifying risks; it is much simpler to just produce 128 bits and be done with it.
If you need shorter identifiers for some reason (e.g. a human user will have to type them on a smart phone...), then you may envision a system with a block cipher, but this is fraught with perils. You will have to understand thoroughly what you are doing. Thus, this is possible, but most probably expensive (thinking time is what costs the most in software development).
In all of the above I talked of bits. That's because bits are what the machine is using internally. Ultimately, you want characters: this is obtained by simply encoding the bits with an ad hoc method, e.g. Base64. With Base64, each character is worth 6 bits, so 64 bits become 11 characters (actually 12, because Base64 mandates a final size which is a multiple of 4 characters, but the last one will always be an '=' sign). With 128 bits, you get 24 characters (the last two being '=='). Other encodings are possible, e.g. hexadecimal or Ascii85. That's really up to you. Base64 has the advantage of already been implemented in most programming framework.