Finding out whether a website uses unsalted MD5 for password hashing

Question

I'm working on a project where websites are analyzed and rated according to password security (factors like min/max password length, alphabet size and more are then calculated into a score).

A great factor to know would be whether a site uses unsalted MD5 hashes to store passwords. To verify that, a collision could be provoked by registering with one password and then logging in with the collision. But data sent via HTML forms is usually ASCII encoded, so I cannot use one of the many known binary MD5 collisions.

Are there any known MD5 collisions between two ASCII-encoded printable strings?

or alternatively:

Is there a way besides collisions to find out whether a website stores passwords as unsalted MD5?

@LucasKauffman Of course, but salted hashes cannot be detected via collision, without knowing the salt :) I'm just trying to get all the information I can. — Danilo Bargen, Sep 10 '13 at 11:14
You are still going to have collisions on a salted MD5 hash if the same salt is used on every string. — Ramhound, Sep 10 '13 at 14:27

score 6 · Accepted Answer · edited Mar 17 '17 at 13:14

To my knowledge (but I might have missed some recent results), there is no known ASCII-compatible collision on MD5.

The main conceptual reason for that is that collision-finding works by introducing some small bit differences, and then trying to cancel out these differences with each other, before they spread too much in the algorithm. Namely, for a given input message m, you consider message m' which differs from m by a few bits. By comparing all the internal values in the computation of MD5 on m and m', you can see the differences propagate. Two differences on the same bit may cancel out, sometimes with probability 1 (if you flip the same bit twice, you get back to the same value), sometimes with a lower probability, i.e. it works only for some possible internal values. That's because operations in MD5 are not just XOR; there are also some bitwise non-linear functions, and a lot of additions. Additions imply carry propagation. A lot of MD5 cryptanalysis is about coping with carry propagation.

Thus, an MD5 collision attack is mostly about choosing a differential path, i.e. a difference between m and m', which has a non-negligible probability of being fulfilled throughout the algorithm and cancelling itself out, yielding a collision. Then you try a lot of pairs (m,m') which differ by that source difference, until you get lucky. Message modification techniques are ways to improve your chances by tweaking pairs of messages which do not work for the whole differential, but almost work. Regardless of such techniques, you still have to work with a differential path which has a not-too-low probability of fulfilment.

Since carry propagation is the main problem in designing a good differential path, the useful tool for that is to modify the upper bit of 32-bit words. Indeed, if 32-bit words u and v differ only by their most significant bit, then u+v will necessarily be equal to 0x80000000, for all values of (u,v): for that bit, there is no carry (the carry, if any, "falls off" the word because we are using a 32-bit addition which truncates the result to 32 bits). That's a one-bit input difference which yields a one-bit output difference with probability 1: a very useful tool. It turns out that most (all of ?) known differential paths which have a high enough probability to yield a practical attack rely, at some point, on such one-bit differences on upper bits.

However, playing with the upper bit of an input word means that there is at least one byte in the colliding pair such that one of the messages will include a byte value in the 128-255 range. I.e. a non-ASCII byte.

A lot of what I explained above is based on vague hand-waving, but it should show the gist of the idea: the "fast" known ways to obtain a MD5 collision imply playing with a few bits of a message, and, right now, this includes the high bit of some bytes, making one of the colliding messages non-ASCII.

The slow way of finding collisions is to try random messages until a collision is found, with no constraint -- so these messages could then be full-ASCII. If you generate a lot of 11-character passwords, with characters being uppercase and lowercase letters and digits, then you should reach a collision when you have hashed about 2⁶⁴ of them, i.e. about one third of all possible 11-character passwords. Unfortunately, while a 2⁶⁴ effort is technologically feasible, it is still very expensive, and nobody has run it. Yet. At least publicly.

From the outside, there is very little you can do to determine the way the password is stored. At best, if password verification appears to be very fast (server responds within a few milliseconds), then you can prove that the password hashing is insufficiently iterated (unless it uses a "pepper", in which case it might be fast and secure at the same time -- see this answer for terminology). Otherwise, to prove the use of a specific hash function, you need collisions, because that's all the information you can obtain: whether a given password works for a given user, or not.

If you can look at the database contents (and the whole point of hashing passwords is based on the idea of an attacker who can do exactly that), then you have a lot of extra information:

If two users with the same password obtain the same hash value, then there is no salt (that's bad).
If two users with the same password do not obtain the same hash value, but the same user, changing his password back to a previous password, gets his hash back to the old value as well, then there is no real salt; the user name is used as (part of) the salt (that's bad too, albeit less so).
You can try candidate hash functions directly.

score 1 · Answer 2 · edited May 23 '17 at 12:40

Are there any known MD5 collisions between two ASCII-encoded printable strings?

No.

I doubt any ASCII-compliant MD5 hash collision inputs have been found yet. But small non-ASCII inputs have been found: https://stackoverflow.com/questions/1999824/whats-the-shortest-pair-of-strings-that-causes-an-md5-collision.

Is there a way besides collisions to find out whether a website stores passwords as unsalted MD5?

Easily; if you have access to the backend database. If they are not storing a salt column in the table or any of the password records match a vanilla MD5 rainbow table - then they haven't used unique salts per password. Finding out if they are using a global custom salt or their unique salts have been merged into a single tuple attribute might take more work.

Are you using MD5 as verbal shorthand for any cryptographic hash?

Because most cryptographic hashes have not been breached, and which hash a website is using would not be apparent from the front-end.

If you have access to the database it's obvious of course :) I don't. (I did not specify that in my question though, sorry...) Regarding to MD5 / cryptographic hashes in general, I'm talking about MD5 because it's the most widespread one while at the same time being very weak with many known collisions. So it should be detectable using collisions. I'm aware that SHA1 and other widely used hash functions are also insecure and that they should never be used for password hashing. — Danilo Bargen, Sep 10 '13 at 11:16

Finding out whether a website uses unsalted MD5 for password hashing

2 Answers2