Why is MD5 considered a vulnerable algorithm?

Question

I know that MD5 is the most vulnerable hashing algorithm, and particularly vulnerable to Collisions. But the collision vulnerability is not very risky and somebody might use that as an advantage, but that's with sheer luck.

OK, let's say I store passwords using MD5. There are some colliding strings here: https://crypto.stackexchange.com/questions/1434/are-there-two-known-strings-which-have-the-same-md5-hash-value But it's very less likely that a user might use these kind of strings as his password. That's why it depends on luck.

And now, let's say a user has used one of these strings. For an attacker to use this collision vulnerability, he/she has to know the original password and know that there's another colliding string for it. But if the attacker already knows the original password, why bother with collision? Is there any way that the attacker can use this as an advantage?

For collisions, it's because of [this](http://www.win.tue.nl/hashclash/rogue-ca/). For password storage, you're not depending on collision resistance, so MD5 is technically ok (if salted and iterated). But people avoid it because, hey, who wants to use an algorithm with a known break, when there are decent alternatives. — paj28, Sep 30 '16 at 16:18
@paj28 I don't know much about certificates but how does the attacker know the original MD5 hash of the legitimate website? — mzcoxfde, Sep 30 '16 at 16:43
This is explained in the article. The attacker generates both the legitimate and fraudulent certificates. The legitimate one is signed by a CA, and the fraudulent one has a matching hash. — paj28, Sep 30 '16 at 16:46

700 Software · Accepted Answer · 2016-09-30T21:33:06.073

I know that MD5 is the most vulnerable hashing algorithm

Well technically (we are technical around here) there are worse algorithms than MD5.

and particularly vulnerable to Collisions

Yes, folks can create a desired hash with a different plaintext. This is not likely to happen randomly, but could occur maliciously.

But the collision vulnerability is not very risky and somebody might use that as an advantage, but that's with sheer luck.

Not sheer luck. There are techniques to find a plaintext that produces a desired MD5. That's a good subject for a different question.

OK, let's say I store passwords using MD5.

Ouch. The main reason you shouldn't use MD5 is because it is a General Purpose (Fast) Hash.

You should be using a (Slow) Password Hash such as

BCrypt is commonly recommended, but be sure to run a quick SHA-2 hash on the input data, so super-long passwords will not be truncated by BCrypt
PBKDF2 but that is less GPU-resistant because it has lower Memory requirements.
SCrypt is better than BCrypt if you have a high enough work factor. Otherwise it is worse against GPUs. (again, because of higher or lower Memory requirements)
The winner of the Password Hashing Competition may be even better than the aforementioned, but has not yet stood the test of time, so don't use it just yet. It's called Argon2, and has separate Work Factor settings for CPU time and Memory load. (nice!)
Repetitive SHA-2 can be used instead of PBKDF2 (still not GPU resistant), but this is more tricky to implement the repetition efficiently (i.e. to be brute-force resistant) because SHA-2 is actually a General Purpose (Fast) Hash.

Most of these options generate random Salt by default, but you should verify whether this is the case!

It is best to include some Pepper (~72 bits of entropy) before the Password prior to hashing. The Pepper can be the same for all your users, but should be stored in a file outside of the database so that component cannot be found via SQL Injection.

Make sure your Work Factor requires about 100ms (with appropriate DoS protection) on your target hardware (knowing that attackers will use faster hardware for Brute force)

Of course no amount of hashing will protect weak passwords, so include password strength requirements.

collision vulnerability ... is there any way that the attacker can use this as an advantage?

In the context of password hash storage this probably will not help the attacker.

So, are you saying that there is always an equivalent string to every string when hashed with MD5? — mzcoxfde, Sep 30 '16 at 16:41
You said "Not sheer luck. There are techniques to find a plaintext that produces a desired MD5. That's a good subject for a different question." So, are those techniques able to get the colliding match for any hashed string? Like for instance can they get the equivalent hash value of "password"? — mzcoxfde, Sep 30 '16 at 17:10
In general, yes, one can manufacture a plaintext to achieve a target MD5 result, without requiring full brute force. I do not know how computationally expensive this is, or whether certain hash results are immune. (subject of separate question) SHA-2 does not have this vulnerability. — 700 Software, Sep 30 '16 at 18:36
So, a SHA-2 is a perfectly fine algorithm for both password storage and preventing phishing (but little more risky than others because of rainbow tables). — mzcoxfde, Sep 30 '16 at 18:54
No. SHA-2 is a General Purpose (Fast) Hash which is fine if your input data has >72 bits of entropy. You should be using a (Slow) Password Hash for password storage because passwords are low-entropy. **Repetitive SHA-2** can be used, but it is tricky to implement the repetition with strength, and, like PBKDF2, will not stand up against GPUs nearly as well as BCrypt, Argon2, or a high-work-factor-scrypt. Rainbow tables are not relevant if you use Salt or Pepper which you should do. What does a Hash have to do with preventing phishing? — 700 Software, Sep 30 '16 at 19:39
At least in PHP, if you prehash the input to circumvent the bcrypt 72 bits truncation, you may encounter issues if the resulting prehash contains special bytes such as `\0`. So you have to make sure such bytes can't occur, by using hexadecimal representation, or base64_encode, etc. See these links: https://github.com/laravel/framework/pull/12905 – http://stackoverflow.com/questions/16594613 – http://blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html — Gras Double, Nov 17 '16 at 11:07

score 1 · Answer 2 · edited Mar 17 '17 at 10:46

1

The main reason why hash algorithms are attacked nowadays has nothing to do with passwords. I believe that MD5 is still reasonably secure when it comes to password hashing. provided you use salts to defeat rainbow tables and iterate the algorithm many times to slow down brute force guessing (however you really should use a standard algorithm instead). The problem is with digital signatures or X.509 certificates, respectively.

For 10 years now there is the "chosen prefix attack" on signatures using md5. In 2006 or 2007 a team around Arjen Lenstra from the technical university Eindhoven created a rogue certificate which had the same MD5 hash as one issued by a commercial CA. The attack required significant computing power and the setup was extremely sophisticated but the result was that MD5 was compromised. See here and here.

The same technique was also employed in the famous FLAME attack.

As you correctly noticed, when it comes to password hashes, the situation is completely different because a collision needs to be generated without knowing the plain text. When it comes to digital signatures the plaintext is usually known and of course this opens up additional attack vectors as the work quoted above shows.

edited Mar 17 '17 at 10:46

Community

1

answered Sep 30 '16 at 20:02

kaidentity

2,634
13
30

I should mention that other hash algorithms are target of similar attacks, too. SHA1 is not to be used anymore for certificates, either, even though I'm not aware of any attacks on it with such drastic results like what has been shown for MD5. – kaidentity Sep 30 '16 at 20:04
3

I heartily disagree with your comment that MD5 is reasonably secure when salted. Because of MD5's speed and that it was designed for older processors, modern GPUs can attack even salted MD5 strings with great efficiency. Even salted MD5s strings can be feasibly and quickly attacked and rainbow tables generated fast. – Herringbone Cat Sep 30 '16 at 20:07
*"MD5 is still reasonably secure when it comes to password hashing. provided you use salts ... and iterate the algorithm many times "* I hesitantly agree with you. Unfortunately This is tricky to implement correctly. The iteration code must be highly optimized. Also MD5 is generally a much lower quality hash than SHA-2 so it would be overall a weaker implementation. Certainly it is best to use BCrypt to outsource the iteration work to folks who have had more time to optimize, and to become GPU resistant. – 700 Software Sep 30 '16 at 21:31

Why is MD5 considered a vulnerable algorithm?

2 Answers2

Linked