Trying to understand password hashing

Question

I am trying to get a grasp of password hashing. Back in the days it seemed so simple, just MD5(password + salt) and you are done. Then md5 was proven to have collisions so people started moving to SHA1 and so on.

Then we started talking about having to create slowness so we implemented many iterations of our hash algorithm in order to make the hash checking slow enough.

What I am trying to understand is:

Why can't SHA512 be used in a password algorithm if we iterate it enough to create it slow? Example is to SHA512 the password 100k times.
Why is PBKDF2 or bcrypt recommended instead of doing the above? Or why is it not?

This answer states that it is "not for hashing a password for safe storage for authentication purposes". However this answer (with many upvotes) recommends the opposite (?); that you should use pbkdf2/bcrypt/scrypt for safely storing passwords.
If a PBKDF2 function relies on SHA1 underneath, is it inherently insecure if SHA1 can be proven broken? (RFC2898 .NET implementation)

Hopefully if someone can answer the above questions I will understand why a simple hash algorithm (provided slowness) is not enough, and also why we need all this seemingly complex key derivative functions in order to do a simple password storage.

See [this answer](http://security.stackexchange.com/questions/17421/how-to-store-salt/17435#17435) for a detailed explanation of *why* we salt. — Polynomial, Jul 25 '12 at 09:23
Every (hash) function mapping an infinite number of inputs to a finite number of outputs has to have collisions. — Peter G., Jul 25 '12 at 11:25

CodesInChaos · Accepted Answer · 2012-07-25T10:01:29.040

Then md5 was proven to have collisions so people started moving to SHA1 and so on.

Note that collision resistance is not required for password hashing. Still there is no reason to use a weaker than necessary hash.

Why can't SHA512 be used in a password algorithm if we iterate it enough to create it slow? Example is to SHA512 the password 100k times.

You can do that. As long as you mix password and salt already at the beginning it should be secure.

We don't recommend that, because there is no reason to invent your own scheme, when there are plenty of standard schemes.

Why is PBKDF2 or bcrypt recommended instead of doing the above? Or why is it not?

Because they're standardized and have been looked at by many cryptographers. So you can be more confident that there are no weaknesses in the scheme than with your ad-hoc scheme.

PBKDF2 is essentially an iterated hash function, which uses HMAC to mix the password and salt.

bcrypt is a different construction. It's slightly harder to break that PBKDF2, since it requires a bit more memory(a few kB), increasing the number of required gates a bit.

There is also a scheme called scrypt which has a tunable memory parameter, allowing you to have the scheme consume significant amounts of memory(several megabytes or more). This prevents special hardware to be much more efficient than standard hardware, since they still need to buy lots of RAM.
scrypt is probably the strongest of these schemes. But it's relatively new, and uses uncommon primitives, so many users still choose older schemes.

This answer states that it is "not for hashing a password for safe storage for authentication purposes". However this answer (with many upvotes) recommends the opposite

That question is about an explicit NIST recommendation for PBKDF2 with password hashing. There is only a recommendation to use PBKDF2 for password based key derivation, a closely related technique. The absence of a NIST recommendation does not imply that the scheme is bad.

If a PBKDF2 function relies on SHA1 underneath, is it inherently insecure if SHA1 can be proven broken?

If there is a first pre-image attack against SHA1, that works under certain constraints, then yes, PBKDF2 with SHA1 is broken. A collision attack on the other hand is not enough. A first pre-image attack is typically much harder than a collision attack. For example we don't even know one against MD5.

"Because they're standardized and have been looked at by many cryptographers" ... thats a side effect, but the real answer would be: "they make the brute forcing computational expensive (cpu + ram)" (ecpecially scrypt and bcrypt). — akira, Jul 30 '12 at 09:05
@akira The OP compared iterated SHA-2 with PBKDF2&co. Iterating SHA-2 in an ad-hoc scheme is slow too but we don't recommend it, because it hasn't been reviewed enough. — CodesInChaos, Jul 30 '12 at 09:32
"we" also don't recommend it coz it is not slow enough, see scrypt and bcrypt docs and consider the amount of raw power gpu-based special hardware in combination with something like hashcat, iterating a function designed for speed and for "easy to implement in hw" several 1000 times more is just not good enough. that's why i would shift the focus of that part of your answer not towards "coz crypto-guys looked at the scheme" but to "it's expensive to derive that key, even in hw". — akira, Jul 30 '12 at 10:21
""Because they're standardized" explains nothing. there was a reason of why it was developed in the first place and why folks consider that to be better than iterating x times sha*. after that it became standard. — akira, Jul 30 '12 at 10:24

score 9 · Answer 2 · edited Jul 25 '12 at 12:20

Why can't SHA512 be used in a password algorithm if we iterate it enough to create it slow? Example is to SHA512 the password 100k times.

There isn't any reason why this cannot work. This is what PBKDF2 essentially is.

Why is PBKDF2 or bcrypt recommended instead of doing the above? Or why is it not?

PBKDF2 is essentially taking a SHA hash and iterating it multiple times.

bcrypt on the other hand, uses the blowfish algorithm and requires more memory access to perform, which isn't very efficient on a GPU. This makes it harder for an attacker with a GPU to speed up the cracking process. This is the same as scrypt, by extension.

If a PBKDF2 function relies on SHA1 underneath, is it inherently insecure if SHA1 can be proven broken?

From my understanding, yes.

The reason why people suggest using key derivative functions for hashing is that the time needed to crack the password hash can be increased by simply raising the iteration count. This makes it easier to keep the hashes secure as hardware gets increasingly powerful without changing the hashing algorithm.

Implementing bcrypt isn't very complex as there are many simple to use libraries that does the job for you. For example: phpass for perl, php and python.

I'm not exactly a crypto expert, so take my answer with a grain of salt. However, this seems to be the general consensus from experts from what I have read.

With regards to the first link, I believe his response was just to the question, there is no official NIST requirement for using PBKDF2 for hashing passwords. It is still better than a simple SHA hash though.

"... take my answer with a grain of salt." The pun was intended, wasn't it ? :) — woliveirajr, Jul 25 '12 at 12:24
@woliveirajr Isn't intentional actually, but thanks for pointing it out. :) — , Jul 25 '12 at 12:25

score 1 · Answer 3 · answered Jul 25 '12 at 09:11

1

Iterating does not always increase the time to break it due to linear hardware scalability.
Because bcrypt makes hardware decoders slower.
But that's the point of doing PKBF, to multiple iterations, making it slower this way, however bcrypt is stronger.

answered Jul 25 '12 at 09:11

Andrew Smith

1
1
6
19

Okay, I get your #1 and #2. You need an algorithm to make it slow because the hashing algorithm cannot do it itself. Why does the linked answer say that PBKDF2 is not for hashing and storing passwords? – Chris Dale Jul 25 '12 at 09:25
I don't get your first point. The cost to break the hash scales linearly with the iteration counts, with pretty much any iteration scheme, as long as you mix password and salt in the beginning. – CodesInChaos Jul 25 '12 at 09:45
Scalability can be considered as for CPU, GPU, RAM, Storage, Network, so ideally the strongest hash will take a 1MB of space, it's transfer from storage and network will take extra time, also, if RAM as in bcrypt is used, it will decrease use of CPU cache and making it executing in more serialized manner, ideally it offends some compiler and CPU multi threading and also slow memory buses. Today hash must be designed for 100 core machines with 10GBps or more, so it is just getting bigger in many various ways. – Andrew Smith Jul 26 '12 at 12:37
And in bcrypt RAM is the case versus pure iteration which is executed only in cache as well very fast on GPU, but with higher memory requirement, both GPU and CPU are performing much slower when executing brute-force search, especially when you try to run very many like thousands of iterators at once, this scenario simply doesnt work on various architectures in case of bcrypt. – Andrew Smith Jul 26 '12 at 12:39

Trying to understand password hashing

3 Answers3