0

I am reading about implementing UUIDs in URLs instead of incremental IDs to reduce attack vectors using obfuscation. I plan to create a UUID from an incremental ID and store the UUID which would be used to access the data.

As per the specification UUID v3 and v5 use names and namespaces, hashing both using either MD5 or SHA-1 encryption algorithms. I understand that UUIDs should never be used for security purposes (Although this answer seems to imply otherwise) and that authentication and authorization is the only way to implement security properly.

That said, true obfuscation of incremental IDs is still very desirable and I have some questions.

Firstly, I have read that UUID hashes are one-way functions meaning it cannot be decoded by an attacker. If I have interpreted this correctly, this appears to be a great option for true ID obfuscation.

From my understanding, only UUID version 3 and 5 currently use hashing algorithms, so version 1 and 4 would not be as suitable.

Thirdly, as noted in the specification, for UUIDs of version 3 or 5 created with the same name and namespace, the UUIDs must be equal. The specification suggests predetermined namespaces but highlights that any namespace can be used. It also implies namespaces as being UUIDs themselves.

I believe I grasp the proposed idea of namespaces, but from reading this, would it also be possible to use use the namespace implementations to generate true obfuscated UUIDs?

My first thought is to use a constant custom namespace and to use the incremental ID as the name. Since each incremental ID is unique, then the namespace doesn't really matter. And since both are hashed, neither the namespace nor the ID can be brute forced.

Is this correct?

If so, I believe this would be strong enough for my use-case, but it got me wondering. As a bonus question, is it possible (if rather overkill) to use a truly random namespace each time along with the incremental ID - for example, a UUID v4 generated from a truly random number seed?

From what I understand, the namespace is simply to avoid collisions so would this add any additional security? I.e. if the plaintext constant namespace was somehow leaked to an attacker.

I imagine in this scenario an attacker could programmatically generate UUIDs using the namespace and a range of integers using both UUID v3 and v5.

Changing the constant namespace to a new constant would only obfuscate UUIDs created from that point on, and all previous UUIDs would need to be recreated and overwritten.

myol
  • 133
  • 4

1 Answers1

2

TL;DR: almost always use cryptographically random version 4 UUIDs.

The point of UUIDs is to create universally unique identifiers, even in distributed systems that have multiple nodes. They were not designed to have security properties other than, possibly, low chance of collisions.

  • Version 1 and 2 are a timestamp with 100ns accuracy. This is attractive e.g. for sorting identifiers chronologically. For disambiguating multiple concurrent generators, the MAC address of the generating computer is used as a 48-bit node ID. If no MAC is available, a 48-bit random number is used instead.

    • Related: the 64-bit Twitter Snowflake IDs require the user to allocate a 10-bit worker ID for disambiguation.
  • Version 3 and 5 hash user-provided unique names to unique UUIDs. To compress the variable-length input name into the fixed-size UUID, a hash function is used.

  • Version 4 are purely random (except for 6 bits indicating UUID version and variant).

From a security perspective, it is worth noting the following properties:

  • For timestamp-based UUIDs (version 1 and 2): Given a node ID (which is visible in other UUIDs), and knowledge about the time when the UUID was created, it is possible to predict a range of plausible UUIDs that might have been assigned. Thus, the value of a time-based UUID should not be treated as a secret and they cannot be used in a capability-based security model.

  • Time-based UUIDs (version 1 and 2) disclose with 100ns accuracy when an event occurred, which might be sensitive information. For example, using a publicly visible UUID for a user account could publicly disclose when that account was created. This may or may not be intended.

  • The hash functions used for version 3 and 5 UUIDs are MD5 and SHA-1. Both are considered broken for cryptographic purposes. This doesn't matter for most purposes, but it would be unwise to include any sensitive information in the input for these UUID versions. If you use a custom UUID as a namespace this behaves like a keyed hash with a 128-bit 122-bit key.

  • Random version 4 UUIDs have different properties. If you use a cryptographically secure randomness source, there are 122 bits of pure randomness in a UUID. These UUIDs will not disclose any information about timestamps, node IDs, or the user-defined names. Thus, version 4 UUIDs are attractive for privacy-sensitive applications, and may be useful for capability-based security models.

Based on this, we can turn to the questions in your post.

UUID hashes are one-way functions meaning it cannot be decoded by an attacker. If I have interpreted this correctly, this appears to be a great option for true ID obfuscation.

Yes, but MD5 and SHA-1 are considered to be broken. If you want to obfuscate IDs, don't hash them. Instead, replace them with a truly random ID, such as a version 4 UUID fed by a secure entropy source.

The difficulty of brute-forcing a hash depends not on the output size of the hash (here: 122 bits), but on the size of the input (here: 122 bit key + size of your IDs). Reusing the key for all hashes is risky, but if you use a different key for each hash you could just as well use the random ID directly. Any security properties of the hash come from this key, as sequential IDs are easy to crack – especially if there is contextual information that can narrow the search.

As perspective for how easy it is to crack hashes of small numbers: as of 2021, a high-end consumer CPU can crack a 32-bit input space in under two minutes. The Bitcoin network's hashrate processes more than a 64-bit input space per second.

Instead of putting a low-entropy ID through a broken hash function, use a pure random number (UUID version 4) instead.

From my understanding, only UUID version 3 and 5 currently use hashing algorithms, so version 1 and 4 would not be as suitable.

Indeed, version 1 is undesirable if you don't want to disclose information about when the UUID was generated.

Versions 3 and 5 use hash functions for the purpose of compressing an arbitrary-length string into a fixed-sized number. The hash function does not directly provide security benefits.

would it also be possible to use use the namespace implementations to generate true obfuscated UUIDs

Version 3 and 5 UUIDs obfuscate but do not destroy information about the input. If you want to remove as much information as possible from an identifier, use a truly random number (such as a version 4 UUID fed from a secure entropy source).

My first thought is to use a constant custom namespace and to use the incremental ID as the name. Since each incremental ID is unique, then the namespace doesn't really matter. And since both are hashed, neither the namespace nor the ID can be brute forced.

As discussed above, this only works as long as the namespace is secret. The incremental IDs are unique but very low entropy, so they are trivial to brute-force as soon as the namespace becomes known. The hash functions in question are cryptographically broken, so it may be possible to predict the hash of one ID from the hash of a neighbouring ID (I'm not an expert on this, but this looks similar to a length extension attack).

If so, I believe this would be strong enough for my use-case

Hashed identifiers might be a bit more privacy-preserving than plaintext identifiers, but this depends on context. The use of broken hash functions is not particularly attractive. Better just use purely random UUIDs.

is it possible (if rather overkill) to use a truly random namespace each time along with the incremental ID - for example, a UUID v4 generated from a truly random number seed?

Yes. Using different keys for a keyed hash is important if the keyed hash is supposed to provide security benefits. This would also make brute-force attacks effectively impossible (this namespace/key behaves like a salt). It does not address cryptoanalysis based attacks against the broken hash functions.

But if you're already generating a secure version 4 UUID, you could just use that UUID directly.

amon
  • 1,068
  • 7
  • 9