4

The Problem

In our system users are identified by a composite key. We'd like to devise a scheme to deterministically convert this composite key into a UUID.

Solutions?

The obvious first suggestion would be UUIDv5, however:

  1. It's based on SHA-1, which I understand to be no longer recommended as a hashing algorithm.
  2. It seems to require one part of the input to be a UUID, but neither part of the composite key is.

The next obvious choice would be to use a SHA-256 hash and "compress" it into 128 bits. This raises 2 questions:

  1. How to do it? (take the first or second half? XOR first half with second half? some other, better way?)
  2. Is it bad practice to use all 128 bits of a UUID instead of using the reserved bits for variant and version?
Anders
  • 64,406
  • 24
  • 178
  • 215
Alex
  • 41
  • 1
  • 2
  • How many bits of unique infirmation does the compisite key hold? You should have at most `n/2` Input bits for a hadh function with `n` output bits to prevent collisions – marstato May 18 '17 at 19:05
  • half of a sha256 should suffice; you don't have enough users to worry about collisions. – dandavis May 19 '17 at 01:17
  • @marstato Technically the "bits of information" is unbounded because they are both opaque `String`s. Isn't @dandavis right though? With a good hash function isn't the risk of collision on 128 bits minuscule (the point of UUIDs)? – Alex May 19 '17 at 16:23
  • @Alex If you want a function that transparently converst a bunch of `String`s to a `UUID` you'll have to worry about collisions. Its very likely that you wont get collisions because you have too few users; but from a pure mathematical perspective collisions are an issue if you *could have* enough users to get collisions. – marstato May 19 '17 at 16:27
  • So I don't know the theory very well, I guess my question is: "Are the odds of a collision 'substantially' higher when UUIDs are generated from hashing pairs of strings vs. generating the UUIDs purely randomly?" Obviously there's a non-zero chance of UUIDv4 collisions but the conventional wisdom is it's so vanishingly unlikely one can just not worry about it. – Alex May 19 '17 at 16:31

2 Answers2

2

You could truncate a hash yourself, but the best way is to use SHA-512/t with t = 128, as specified in section 5.3.6 of the FIPS 180-4 standard. This is an algorithm almost identical to SHA-512 except it is truncated and, to distinguish it from existing truncated SHA-512 hashes, it uses a custom IV derived from an ASCII string containing the desired digest length. It is also described on Wikipedia:

SHA-512/t is identical to SHA-512 except that:

  • the initial hash values h0 through h7 are given by the SHA-512/t IV generation function,
  • the output is constructed by truncating the concatenation of h0 through h7 at t bits,
  • t equal to 384 is not allowed, instead SHA-384 should be used as specified, and
  • t values 224 and 256 are especially mentioned as approved.

The SHA-512/t IV generation function evaluates a modified SHA-512 on the ASCII string "SHA-512/t", substituted with the decimal representation of t. The modified SHA-512 is the same as SHA-512 except its initial values h0 through h7 have each been XORed with the hexadecimal constant 0xa5a5a5a5a5a5a5a5.

Note that there is no SHA-256 equivalent, but that is fine because SHA-512 is often faster on 64-bit computers due to it working with 64-bit words instead of 32-bit words.


With that said, are you really generating a UUID that requires collision resistance? If all you want is to prevent your composite key from being "reversed" from the UUID, then MD5 could work for that because it is not vulnerable to preimage attacks. It's still a fine way to generate a random 128-bit digest, but if you ever try to rely on it for collision resistance, it will fail you spectacularly.

Use SHA-512/128 if at all possible.

forest
  • 64,616
  • 20
  • 206
  • 257
  • SHA1 and SHA2-512/160 are both going to produce a 160-bit digest, which is too long (a UUID has only 128 bits, some of which are reserved). MD5 is 128 bits, and depending on the situation its weaknesses might not even be a problem, but it's a bad idea to use it; SHA-512/128 (or even manual truncation) would be fine. – CBHacking Aug 19 '22 at 01:32
  • @CBHacking Oops, for some reason I was thinking that UUID was 160-bit. Edited. Thanks! – forest Aug 19 '22 at 01:34
  • I think this answer gets the hash part right, but it misses some UUID aspects that I mention in my answer. Still very good insights on the hashing side; today I learned about SHA-512/t. – Anders Aug 19 '22 at 08:00
2

It's based on SHA-1, which I understand to be no longer recommended as a hashing algorithm.

You can mitigate this by first hashing your composite key with a stronger algorithm (such as SHA 512), and then use the hash as input to the UUID v5 algorithm.

It seems to require one part of the input to be a UUID, but neither part of the composite key is.

To create a v5 UUID you need a namespace and a name. The namespace is indeed another UUID. It's purpose is to ensure global uniqueness - as long as my namespace is globally unique, it does not matter if my names are not. A common way to pick a namespace is to just generate a random v4 UUID once, and then use it whenever you generate UUIDs for this specific purpose.

The next obvious choice would be to use a SHA-256 hash and "compress" it into 128 bits. [...] Is it bad practice to use all 128 bits of a UUID instead of using the reserved bits for variant and version?

You can do this, sure, but it would not result in an UUID. If the reserved bits are not valid or, even worse, are incorrecct, you do not have an UUID as per the specification. Sure, you have a 128 bit digest, but it is not an UUID. Do not expect it to work in systems expecting an UUID.

To do this and call it an UUID would be bad practice. And actually, there is no need to. You can just use a v5 UUID:

  • As a namespace, consistently use the same random v4 UUID.
  • As a name, use a SHA 512 hash of the composite key. (You can use all bits, no need to truncate it, since the name can have any length.)
Anders
  • 64,406
  • 24
  • 178
  • 215