Does a hash have more than one possible original message?

Question

Possible Duplicate:
Collision rate for different hash algorithms

Many hashing algorithms seem to have fixed-length message digest as output.

If I compute the md5 hash of two strings:

>>> hashlib.md5("This is a really really long text string to make a hash out of").hexdigest()
'2916991b5ebba69ab38a84a0a72b4176'

>>> hashlib.md5("Short").hexdigest()
'30bb747c98bccdd11b3f89e644c4d0ad'

I get an output that is 32 characters long for each, even though there is a significant difference in the lengths of the inputs. Is it theoretically possible that I could come up with two (or more) completely different inputs that generate the same output, since there are infinite possibilities for an input and only a finite number of characters to work with for the output?

If yes, what is the likelihood of finding another input that would generate the same output?

The real trick isn't finding a collision, it is finding a collision that maps to a data that is actually useful, and wouldn't be immediately recognized as crap. Which isn't impossible, particularly for MD5(). http://en.wikipedia.org/wiki/Collision_attack http://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities — Zoredache, Jul 02 '12 at 22:38
I think this question is entirely covered by the existing one - which covers collision rates. — Rory Alsop, Jul 02 '12 at 22:38

CodeExpress · Accepted Answer · 2012-07-02T22:38:45.453

Is it theoretically possible that I could come up with two (or more) completely different inputs that generate the same output, since there are infinite possibilities for an input and only a finite number of characters to work with for the output?

Yes, its theoretically possible. But hash functions are so designed to make it practically difficult. When I say practically difficult, I mean it will take immense compute cycles (which translates to money and time) to compute a collision.

If yes, what is the likelihood of finding another input that would generate the same output?

This depends on the hash functions. The property is called Collision Resistance

To depict via an example, lets construct a hash function which hashes numbers into a resulting smaller address-space [0, 9]. Our simple hash function is hash(X) = X mod 10. Hence

hash(10)   = 0
hash(1864) = 4

This is a valid hash function but has very poor collision resistance. For example, hash of 24 and 1284 is the same. If you are relying on the hash(4 in this case) for the integrity of the message, a attacker can safely replace the message 24 to 1284 which hashes to the same value. This particular attack is called a Preimage attack Preimage collision resistance is a prime property of any hash function.

Does a hash have more than one possible original message?

1 Answers1