Possible Duplicate:
Collision rate for different hash algorithms
Many hashing algorithms seem to have fixed-length message digest as output.
If I compute the md5 hash of two strings:
>>> hashlib.md5("This is a really really long text string to make a hash out of").hexdigest()
'2916991b5ebba69ab38a84a0a72b4176'
>>> hashlib.md5("Short").hexdigest()
'30bb747c98bccdd11b3f89e644c4d0ad'
I get an output that is 32 characters long for each, even though there is a significant difference in the lengths of the inputs. Is it theoretically possible that I could come up with two (or more) completely different inputs that generate the same output, since there are infinite possibilities for an input and only a finite number of characters to work with for the output?
If yes, what is the likelihood of finding another input that would generate the same output?