12

I'm trying to understand the use of GUID and what has always left me wondering is what's so special about them that I should consider using them instead of rolling my own type of unique ID. In other words, why can't I use a Whirlpool hash like:

4bec4b25ff46e09f7d7adb5b4e6842f871d7e9670506d1a65af501cf96ddf194d0132b85e66c1baaeb5319f2030b607121aae2a038458d32b4d4b03dfd46d5ea 

instead of a GUID, MD5 or SHA2 for the same reason?

I could even tailor the length using substrings of a Whirlpool hash and calculate the probability of collision myself instead of being restricted by GUID specification.

Vilican
  • 2,703
  • 8
  • 21
  • 35
dendini
  • 680
  • 2
  • 8
  • 12
  • 6
    If you used a hash function instead of a GUID, what would you use as the starting value? – Brian Adkins Apr 12 '13 at 12:30
  • http://stackoverflow.com/questions/221165/pros-and-cons-of-using-md5-hash-of-uri-as-the-primary-key-in-a-database – Brian Adkins Apr 12 '13 at 12:36
  • A V4 GUID is essentially a random 122 bit value with a specific format. Some other GUID variants are created with hashes, such as MD5. – CodesInChaos Apr 12 '13 at 12:37
  • I didn't specify the hash on what but I imagine I would use the current time in millisec and my company domain.. After the answer below I've come to the conclusion that except for GUID with method 4 which are random, the others are no different in use to a whirlpool hash with a well defined input – dendini Apr 12 '13 at 13:30
  • If you use the hash of the current time, everyone who generates such an ID at the same time will get the same ID. That's hardly globally unique. If you add your company domain, if two people in the company create an ID at the same time, they get the same ID! – Josef Jan 18 '17 at 10:47

3 Answers3

10

First of, a hash function has an input: you hash something. GUID (actually UUID) don't have any input. To generate "unique identifiers" with a hash function, you just don't use a hash function; you have to define what you are actually hashing.

There are several standard methods for generating UUID; all these methods aim at achieving "uniqueness" of the generated identifiers. Method 3 uses the MD5 hash function: you generate the UUID by hashing some data which is already inherently unique worldwide (e.g. a URL), but longer than the 16 bytes of an UUID. This method closely resembles what you suggest, except that it defines clearly what is hashed (or at least, it states in plain words that when hashing, you hash something and your UUID won't be more unique than what you hash). Method 5 is like method 3, but with SHA-1 instead of MD5 (output is truncated to 128 bits).

Other methods use physical or configuration elements of the local machine (e.g. MAC address and current time for method 1). Most method are "cooperative": they ensure uniqueness but new UUID values can be predicted. For many security-related protocols, when you need unique ID, you actually need ID that won't collide with previous ID (or will do so only with negligible probability) and cannot be predicted by attackers; for that, you need "method 4": the 128-bit UUID contains 122 random bits, generated from a cryptographically strong PRNG. This method will provide "strongly unique" identifiers, and is better than any homemade construction.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 4
    I don't think it's guaranteed that the PRNG used to create V4 GUIDs will be cryptographically secure. – CodesInChaos Apr 12 '13 at 16:37
  • Maybe you care to explain why you don't think so? Tom above clearly states that method4 is strongly unique, not failsafe, which is what we require from uuid. – Lynx-Lab May 15 '18 at 09:12
4

A GUID is a random unique identifier you generate and then assign to something. "oh, you're so cute, I think I'll call you Charlotte" (only Charlotte is random).

A MD5 checksum is something already inherent to the object that can be identified by anyone. "Hey look, according to this scale, Charlotte weighs 25 pounds".

A UUID is interchangeable in some circles with GUID, but typically a UUID has a base seed.

SHA-1 generates a very long ID similar to MD5, but since it doesn't predetermine the length of the output it truncates it. Kinda like pi, but only to the tenth place.

  • Simple remark: GUID can be any 128-bit value, MD5 is determistically generated from the content to hash. – peterh Nov 12 '15 at 21:18
2

I'm going to assume the question is about "GUID vs any other way of generating an arbitrary length string of hexadecimal digits", rather than "GUID vs hashing something", since that makes a lot more sense. (If my assumption is incorrect, then I'll remove this answer)

Really, it's about standards. Depending on how you intend to use this identifier, it might be easier and faster to store a GUID than a string. Eg, most database systems have specific GUID/UUID types that store it as a 128-bit number rather than a string, which will be stored character-by-character. Also, most systems that deal with GUIDs have mechanisms in place to generate new ones, which they will not have for arbitrary random strings.

Mike Caron
  • 186
  • 8
  • Wow, I did not realize this question was almost 5 years old. Oh well, hope it's still relevant? – Mike Caron Jan 18 '17 at 06:27
  • It's perfectly acceptable to post answers to old questions, if you are able to contribute something that hasn't yet been discussed in existing answers. – user Jan 18 '17 at 16:15