3

Normally, the password size is fixed by the hash function in which the password is given right ?

Because otherwise, if the password size is bigger than the hash result size, there should be potential collision am I right ?

Ex : Consider the Hash function SHA-1, the size of result is 160 bits (or 20 bytes). If the password size is bigger than 20 bytes then there are at least two possible password for the same hash.

Kruncho
  • 403
  • 5
  • 10

3 Answers3

7

What you're talking about is known as the pigeonhole principle - if you have n possible passwords and a hash function with m possible outputs, where n > m, there will always be some input values which produce the same output hashes.

The question then becomes: does this matter? With a hash output space of 2160 possible values, an accidental collision has a probability of about 6.842×10-49. Not a whole lot. This gets more interesting when you look into birthday attacks, because the probabilities get much higher, but it's still generally considered infeasible to find 160-bit collisions using naive methods such as brute-force.

In reality, a smart attacker is more likely to focus upon vulnerabilities within the hash function which reduce the necessary computation in order to find a collision. But, in the arena of password storage, this should mostly be moot - passwords are easy to crack if you only hash them with SHA1, because GPUs can brute-force at a rate of billions of hashes per second due to their massively parallel nature.

If you want to store passwords safely, don't use a hash. Use a key-derivation algorithm designed for password storage, such as PBKDF2 or bcrypt. You should also take a look at a related answer I wrote about salted hashes, which goes into the history of how we progressed from hashes, to salted hashes, onto modern KDFs, and the rationales behind each defense and attack step.

Polynomial
  • 132,208
  • 43
  • 298
  • 379
  • Thanks a lot for the answer. I didn't know about KDF. Maybe it's a silly question but howcome OS maintainer don't included such algo instead of staying with MD5 or SHA-1 ? – Kruncho Jun 09 '15 at 12:26
  • @Kruncho I'm not sure what you're asking there. If you're referring to why they're used in Linux or Windows by default, I don't know the answer. It's probably something to do with a lack of NIST approval, or just that it's always been done in a particular way. Microsoft provides PBKDF2 in the .NET Framework though (via the Rfc2898DeriveBytes class) and bcrypt is available as a command-line tool on Linux. I'm also pretty sure that some Linuxes do support bcrypt hashes for logon (in /etc/shadow), either natively or with additional PAM modules. – Polynomial Jun 09 '15 at 13:24
  • Sorry I meant for the default way. And your answer is good to me. Thanks again – Kruncho Jun 09 '15 at 13:31
  • @Polynomial $2^{-160} = 6.842×10^{-49}$ but for 160 bit we should use $2^{-80}$ Isn't it? – Porcupine Aug 24 '19 at 22:08
  • @Nikhil Strictly speaking, yes, but the point is that a naive collision has a probability of 6x10^-49, hence the mention of Birthday attacks immediately afterwards. – Polynomial Aug 30 '19 at 22:16
1

the password size is fixed by the hash function in which the password is given

-> No.

The hash functions hash passwords of any length to a smaller length string. They hardly collide.

What can promise you a collision is that you hash 2^160 + 1 number of different passwords, then at least one hash must collide. Only hashing two passwords of length longer than 20 bytes won't collide.

Johnny Wong
  • 201
  • 1
  • 4
0

Yes, statistically you are right. But consider the number of passwords with length of 160 bits, its 2^160=1461501637330902918203684832716283019655932542976

And collision can occure even between two 3 characters long password :)

Romeo Ninov
  • 638
  • 5
  • 11