19

Is there any collision rate measure for popular hashing algorithms (md5, crc32, sha-*)?

If that depends only from output size, it's quite trivial to measure, but I suppose that depends also of distribution and algorithm's internals (and it demands some kind of formal proof, i think).

Rory Alsop
  • 61,367
  • 12
  • 115
  • 320
ts01
  • 373
  • 2
  • 6

1 Answers1

23

If I try to create collisions for MD5, I can make one every 14 seconds (on average) on my PC, using a single core (Core2, 2.4 GHz). This exploits the weaknesses in the internal structure of MD5. If I only try random data and wait for collisions to appear, well, I will wait for quite some time: the first collision is expected after about 264 hashed messages (give me a thousand PC, and I should achieve a collision in about 20 years of full-time computation).

For currently unbroken cryptographic hash functions, there is no known internal weakness (that's what "unbroken" means), so trying random messages is the best known method to create collisions. Chances to get a collision this way are vanishingly small until you hash at least 2n/2 messages, for a hash function with a n-bit output. This means that with any proper hash function with an output of 256 bits or more, the collision rate is, in practical conditions, zero (you will not get any and that's the end of the story).

Wikipedia has some pointers on the subject. See also chapter 9 of the Handbook of Applied Cryptography (page 369).

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 7
    To add: CRC32 (mentioned in the question) is not a cryptographic hash, and should not be treated as one. Collisions can be generated using pencil and paper fairly easily. – BlueRaja - Danny Pflughoeft Apr 18 '11 at 16:14
  • @Thomas, can you please clarify how exactly you achieved such a rate 1 in 14 seconds if later you are claiming that with random data you need to wait for 20 years? – Salvador Dali Nov 29 '12 at 01:42
  • 2
    To create collisions for a hash function, you must use _cunning_ or _luck_. Luck always work, even for a perfect hash function, but it takes time (20 years with 1000 PC). Cunning exploits weaknesses in the hash function structure; this works or not, depending on the hash function. For MD5, this works beautifully (14 seconds on one PC), which is why MD5 is said to be "broken". – Thomas Pornin Nov 29 '12 at 01:53
  • @Thomas, you gave this answer quite a while ago. Is the sha256 still very solid, or in another word, unbroken? – minghua Nov 20 '18 at 06:33
  • @minghua: Yes . – Thomas Pornin Nov 20 '18 at 15:50
  • Appreciate the reply. Wondering: Is there some mathematical proof that why sha256 is secure? Or is it expected to be broken in N number of years? Or when it is broken the collision rate will be lower than a certain number X? Or somethings like that? – minghua Nov 20 '18 at 16:16
  • @minghua: there is no mathematical proof that secure hash functions can really exist at all. Thus, there is no proof, in particular, that SHA-256 is secure. Instead, we have the next best thing: SHA-256 has been around for about 17 years and nobody found anything wrong with it yet. – Thomas Pornin Nov 20 '18 at 17:39