MD5 collision attacks: are they relevant in password hashing?

Question

DISCLAIMER: This is not an endorsement of MD5 as a password hashing function. I know about parallelization, GPUs, and dedicated password hashing functions like bcrypt and scrypt.

With that out of the way, I'm getting at least a little irked when every time someone mentions "MD5" and "password hashing" in the same sentence, the hivemind jumps in and screams "MD5 is broken!" (usually followed by a misinformed suggestion to use another general purpose hashing function like SHA-*).

But is it really? As far as I know, all feasible attacks on MD5 are chosen-prefix attacks. Are any of these in any way applicable to hashed passwords?

And in a related question: while hash collisions are 2^n/2, does that really mean anything when used against passwords, especially over the web? Just because you've found a collision doesn't mean you'll be able to send them in password field, which is usually limited in length and probably not binary safe.

Doesn't that mean that for the purpose of hashing passwords, a function is only truly broken if there is a preimage attack on it? And as far as I know, MD5's preimage attack is still 2^123.4, which is nowhere near feasible.

_{Note: Bruteforcing a weak password will take much less than 2^123.4, and yes MD5 is fast (by design). I'm not concerned with either of these problems, which aren't really MD5's fault.}

Every single MD5 has is known, this mean, a collision attack is easy enough to calculate. All you need to know is what the expected hash is, even if you don't, there are far few hashes then 12 character password combinations. — Ramhound, Oct 25 '12 at 20:38
@Ramhound Oh really? What does `848b45091167bafb55cd2dbacf139af3` map to? (it's an unsalted, 12-character ASCII string) There are certainly far *more* MD5 hashes than 12 character passwords, even if we consider the entire printable ASCII range (3.4e38 vs. 5.4e23 ~= 2^128 vs. 2^79, a 2^49 difference). But then again, this is irrelevant. — Null, Oct 25 '12 at 20:47
Excellent question, I was just about to ask something similar but I felt that someone must've been as annoyed with this matter as I am, so I found it after a couple of minutes of searching. — Adi, Dec 18 '12 at 17:49

score 18 · Answer 1 · edited Apr 13 '17 at 12:48

Collision attacks have no impact on password hashing, although there can be some details depending on what you hash for. Roughly speaking, when we hash passwords, we may want to do two things:

Check that the password hashes to a given stored value: this is password verification (e.g. to know whether to grant login access to a server).
Compute a big symmetric key deterministically from the password: this is key derivation (e.g. to encrypt a file).

Password verification works only on resistance to preimages (hardness of finding a value m such that h(m) matches a given output). However, key derivation needs a bit more than that. Ideally, for key derivation, the hash function should behave as closely as possible to the mythical random oracle. We already know that hash functions based on the Merkle-Damgård construction, such as MD5 or SHA-256, are not random oracles (because of the "length extension attack"); however, we can use such functions in HMAC, which uses two nested hash function invocations precisely to avoid issues with the MD construction. But HMAC is "proven" secure only relatively to an internal property of the hash function (namely, the internal "compression function" should be indistinguishable from a PRF) and we know that this property is not fulfilled in the case of MD5, because otherwise collision attacks would not be feasible...

To sum up, when using MD5 for key derivation, we need to apply schemes such as HMAC-DRBG (a PRNG based on repeated HMAC invocations), which use MD5 internally. Such schemes are known to be strong as long as the hash function is strong, and while collision attacks do not directly apply, they show that this "strength" is not achieved. Thus, the guarantee is void. Nobody knows, right now, how to weaken HMAC/MD5, but we have the example of MD4: MD4 is thoroughly broken for collisions (collisions can be produced "instantly"), and there is a known attack on HMAC/MD4 which is quite expensive but nonetheless much faster than the theoretical 2¹²⁸. The attacks on HMAC/MD4 do not exploit collisions, but build on differential paths which are the same source from which collision attacks have been designed. Therefore, it is highly suspected that HMAC/MD5 is not as strong as, say, HMAC/SHA-256 (even when truncated to 128 bits).

Correspondingly, for key derivation, do not use MD5. It would not be broken right away, but it is still looking for trouble. There is no need to immediately migrate existing systems which use MD5 for password hashing, but for new designs, you should avoid it.

Mandatory reminder: Of course, when hashing passwords, regardless of the intended usage (password verification or key derivation), you shall apply salts and configurable slowness. A password is by itself a weakness because of biological limitations of the human brain which manages it. Salts and configurable slowness are ways to cope with it: salts thwart parallelism (the attacker cannot share attack costs between several password instance -- e.g. through precomputed tables such as rainbow tables) and slowness defeats Moore's law (computers get faster over time, but human brains do not). So you would not want to hash a password with "just MD5", but rather use a dedicated password hashing function such as PBKDF2 or bcrypt. PBKDF2 internally uses a hash function, e.g. MD5. It so happens that MD5 (like SHA-256) maps very well on the computing abilities of GPU, which makes it a questionable choice for this job, and bcrypt is arguably better (see this answer for details).

score 6 · Answer 2 · answered Oct 25 '12 at 08:38

Collision resistance is irrelevant for password hashing. The important property is first pre-image resistance. The only practical attack against MD5 is finding collisions.

Thus there is no practical attack against MD5 based password hashing that doesn't apply equally to other fast hash functions, such as SHA-2. No matter if you use MD5 or SHA-2, the strongest attack is guessing the password.

Henning Klevjer · Answer 3 · 2012-10-25T07:20:18.573

MD5 is broken because it's possible to get it to collide pretty easily. However, this transfers to applications such as x509 certificates in that it can be lead to collide, i.e. changing the public key in the certificate in such a way that the entire certificate's MD5 collides with the original of that certificate.

The only benefit one gets from MD5's broken collision resistance with regard to password cracking is finding two "passwords" that will behave the same way with MD5. If you restrict one of these "passwords" to a particular value, you're back to preimage resistance again.

As Wikipedia says:

On 1 March 2005, Arjen Lenstra, Xiaoyun Wang, and Benne de Weger demonstrated[12] construction of two X.509 certificates with different public keys and the same MD5 hash, a demonstrably practical collision. The construction included private keys for both public keys. A few days later, Vlastimil Klima described[13] an improved algorithm, able to construct MD5 collisions in a few hours on a single notebook computer. On 18 March 2006, Klima published an algorithm[14] that can find a collision within one minute on a single notebook computer, using a method he calls tunneling.

The third paragraph is interesting, where you argue that not all preimages may fit in the password field. You are probably correct here too; the entering of such a password may need to happen at a lower level, as close to the MD5 calculation as possible.

You might say that a hash function is truly broken only when its preimage resistance is broken, but for many applications, especially signing of documents, the broken collision resistance of MD5 is important. Say you hash some critical document with an MD5 message authentication code. An adversary may be able to reproduce written documents that look clever in form, tells an entirely different story and still produces the very same MD5. You can call this a lead collision.

When that is said, try to find this password: 2c89571cdfd318509c05d1f19fe26336

I feel like this answer skirted around the most important points of my question. I already know the implications of the current MD5 attacks on what you've listed (eg: certificates). I specifically asked what effect they have, if any, on hashed passwords. Your very last sentence highlights the heart of my question: if I give you just an MD5 hash, will you be able to recover my password if it has enough bits of entropy? That's the question I'm asking. — Null, Oct 25 '12 at 15:36
In theory, I guess it is possible at least to get a preimage (i.e. not necessarily the actual password, but one that behaves the same with MD5) if you try hard enough. So maybe, for a good enough password, lead collision is the way to go. Very good question when I finally really understood it ;) — Henning Klevjer, Oct 25 '12 at 16:36

MD5 collision attacks: are they relevant in password hashing?

3 Answers3

Linked