14

I've often read that MD5 (among other hashing algorithms) is vulnerable to collisions attacks. I understand the collision part: there exist two (or more) inputs such that MD5 will generate the same output from these distinct and different inputs. There are 20 examples of such inputs given here.

Fair enough, but how does this specifically lead to a vulnerabilities? What is the vector of attack on a system with this kind of a flaw?

For example, say I have a system that simply hashes users' passwords using MD5 for "encryption". Now I want to look at 2 scenarios. (For simplicity, also assume no new users/passwords are added to the system).

Scenario 1: None of the passwords in the system are the kind of inputs that could generate collissions. In other words, all passwords processed by MD5 generate unique hashes. Is there still a collision vulnerability there?

Scenario 2: One of the passwords in the system is this beauty (Collision 14, Message 1 from the list):

Ó)\dvµ‡‚G ÌÕÀ˘∏8;å∑œPäÌ/+6éE≠3}˝óÙ|o‡¬{Ç9¸–Û≤‚Mgª%DWÌé$∞\˛UÊiJ;fÉ;™÷ÎQëµCI<˝y≠˜LùΩ]Owå˜1Ú'¬vo`âÃ(©tú√\îùf#4ú¿�(

So there exists another unsightly string that generates an MD5 collision with this password. However, unless the attacker already know this information, how is the vulnerability exposed?

Mister
  • 3
  • 2
Arman H
  • 257
  • 1
  • 2
  • 6
  • the vector is really apparent when the attacker can control both versions (the one the victim wants that has been vetted and signed by a third party and the one the attacker wants the victim to have) – ratchet freak Jul 11 '13 at 23:09
  • Is that the only vector for such an attack? In reality, how likely is the attacker to have/gain control of both versions? –  Jul 11 '13 at 23:12
  • Would you like this migrated to IT Security? You might get better answers there than here. –  Jul 11 '13 at 23:32
  • 1
    In electronic payment, for example, the attacker may get someone to sign a cheque for 10$, then replace that cheque with another one that hashes to the same value so that the signature is still valid. (Cryptographic signatures typically sign only the hash of the document, not the document itself, because signing an arbitrarily large document instead of a very small fixed-size hash would be much too expensive.) – Jörg W Mittag Jul 11 '13 at 23:33
  • @WorldEngineer, let's migrate if it belongs there more. –  Jul 11 '13 at 23:38
  • @JörgWMittag, wouldn't the attacker have to find a suitable string that generates the same MD5 hash as the orignal signature? Given that there are only a few dozen such collision-generating inputs, aren't the *realistic* chances of attacker's success nearly 0? – Arman H Jul 11 '13 at 23:48
  • I find it hard to believe that only a few dozen inputs can cause collision. I feel like the number is much much greater than that. – David Houde Jul 12 '13 at 01:27
  • @DavidHoude, even if there were an infinite number of inputs causing collisions, there is only a finite number of them that are known at any given time. Even assuming there are thousands and millions of colliding messages, how does that affect scenario #1, where none of the hashed strings are colliding inputs? – Arman H Jul 12 '13 at 01:57
  • See the related question "[MD5 collision attacks: are they relevant in password hashing?](http://security.stackexchange.com/q/23116/6939)". In short, collisions are not a problem when using MD5 for password hashing (though you shouldn't do it [for other reasons](http://security.stackexchange.com/q/211/6939)), but it *might* be if you use it to derive keys from the passwords to perform encryption. – mgibsonbr Jul 12 '13 at 02:06
  • @ArmanH: those few dozen examples are _not_ the "finite number that are known at any given time". There are algorithms that, given two different inputs, calculate a simple modification to one or both of them so both have the same MD5. That allows _anybody_ to create two different documents that are "certified" by a single signature. – Javier Jul 12 '13 at 15:32
  • Just consider the fact that the MD5 hash of `A` will always be the same. Which means you can calculate N+1 characters combinations that will generate the same hash easily enough. To clarify you have to determine the maxium amount of bits MD5 can calculate in order to do this which isn't infinite and extremely small space. – Ramhound Jul 16 '13 at 18:56

3 Answers3

7

One common example I've seen involves digitally signing documents using MD5.

Say you have two documents. One of the documents is very innocent, and the other is something malicious. Since you can create both of these documents, you can make it so that they have the same MD5 signatures (using postscript to make the PDFs, for example).

Now you you get somebody to sign the innocent document (which is typically done by signing a hash of the document), but because the hash of both documents is the same, the person just accidentally also signed the malicious document (which might say something like "I give permission for this person to take a million dollars out of my bank account). Now you can present the malicious document to someone else and they will believe the signature on it because of the collision in the document.

Oleksi
  • 4,809
  • 2
  • 19
  • 26
  • I'm looking at the kinds of messages that generate collisions, and they all look like string garbage. How does one go from something like `Ó)\dvµ‡‚G ÌÕÀ˘∏8;å∑œPäÌ/+6éE≠3}˝óÙ|o‡¬{Ç9¸–Û≤‚Mgª%DWÌé$∞...` to generating a pair of collision-vulnerable documents? – Arman H Jul 12 '13 at 00:38
  • When you create a postscript document to make a PDF, not all characters are displayed on the screen. There are often many mechanism for including non-printable characters (like comments) – Oleksi Jul 12 '13 at 00:40
  • Won't the actual visible/printable characters, PDF document's formatting, etc, all affect the final hash output? Regardless of how it is achieved, the core of my question is: Does abusing hashing collisions ultimately come down to tricking a user, or can it be used to directly interfere with a system (and if yes, how)? – Arman H Jul 12 '13 at 00:42
  • 1
    Yes, but you can "fix" part of the document to be meaningful text, and then build a hash conflict by messing with characters which don't get displayed without affecting the characters which do get displayed. This is generally up to the method of generating collisions, but the methods for generating MD5 collisions are flexible enough to let you do this. – Oleksi Jul 12 '13 at 00:45
  • 1
    see http://th.informatik.uni-mannheim.de/people/lucks/HashCollisions/letter_of_rec.ps and http://th.informatik.uni-mannheim.de/people/lucks/HashCollisions/order.ps : both have the same md5, both are very different... – woliveirajr Jul 12 '13 at 01:54
  • In this example, they thought they were only issuing an SSL cert. In reality, they were signing a cert to create an intermediate CA. http://hackaday.com/2008/12/30/25c3-hackers-completely-break-ssl-using-200-ps3s/ – mgjk Jul 12 '13 at 11:18
6

Password hashing works on preimage resistance: the password hashing process should be such that it is not computationally feasible to find an input (a password) matching a given output (the hash), save by trying a lot of potential inputs and "get lucky". The "get lucky" attack is still a concern, because passwords are chosen by human users, who are not as imaginative and random as could be wished for; which is why we need more than a simple hash function (we need salts, we need slowness; see this). However, collisions have no impact on the security of password hashing. The ability to generate collisions at will does not give extra cracking power to the attacker.

As others have pointed out, collisions can be used for attacks in other setups involving signatures. These are scenarios where the attacker can:

  1. generate colliding pairs of messages, with sufficient control on the format of these messages so that one message looks "innocent" and the other suits the attacker's goals;
  2. have the innocent-looking message signed by a third party whose signature function begins by hashing the message "as is".

Under both these conditions, the third party will produce the signature, which will be valid (as far as signature verifiers are concerned) for both messages. This is like having the signer issue a signature on a message without showing it to the signer. A demonstration has been done in 2008 with the "messages" being X.509 certificates, and the signer a Certification Authority. A real-life application of the same concept was done in the Flame malware.

A point to be made is that though raw signatures are vulnerable to collisions, as explained above, they can be protecting by having the signer include some randomness at the beginning of what it signs. In the context of fake X.509 certificates, the attacker must be able to predict, all the bits of the certificates up to the public key; this includes the certificate serial number which is chosen by the CA. If the CA uses big random serial numbers, then the attack does not apply, even if MD5 is used as hash function. On the other hand, CA who use sequential or time-based serial numbers are predictable.


Another worry with collisions is about security proofs. Some cryptographic protocols can be proven secure under some specific assumptions about the cryptographic primitives used in the protocol; for instance, some protocols using hash functions can be proven to be secure as long as the hash function is assumed to be collision-resistant, or some other property. An example is HMAC. HMAC reuses an underlying hash function, and was designed so as to accommodate hash functions of the Merkle–Damgård persuasion, notably MD5 and the whole SHA family. Such a hash function is built around an internal "compression function". HMAC has been proven secure under the assumption that the compression function behaves like a pseudorandom function.

However, if the compression function of MD5 is a PRF, then it is not feasible to compute collisions for MD5 with cost less than 264 on average. We know how to produce MD5 collisions for much less than that. This implies that the MD5 compression function is not a PRF. So this voids the security proof on HMAC/MD5.

This does not mean that we know how to break HMAC/MD5 with collisions ! It is "just" that whatever mathematical guarantee of security that we had has just evaporated like morning dew under the merciless midday Sun. In that respect, MD5 collisions are not a tool to be leveraged by the attacker, but are still worth some attention.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
5

Take a look at this link - http://th.informatik.uni-mannheim.de/people/lucks/HashCollisions/ - to see an explanation and the example.

It works exactly as Oleksi said in his answer: you have two documents that will generate the same MD5 hash. The hash is signed, so that both documents can be "valid".

woliveirajr
  • 4,462
  • 2
  • 17
  • 26
  • The example on that site involves doctoring BOTH documents, Prior to having one of them signed. is there a way to execute the attack when the original, authentic document has already been signed - ie by only doctoring ONE file, to match the existing one? – Caesar Jul 12 '13 at 02:51
  • @caesarsgrunt : take a look at https://en.wikipedia.org/wiki/Cryptographic_hash_function#Properties . What you ask is the second pre-image attack. This answer was about the collision resistance... – woliveirajr Jul 12 '13 at 03:08
  • 1
    Ah, right - so in fact poor collision resistance is not a problem in that situation. Thanks for the answer/link. – Caesar Jul 12 '13 at 03:15
  • @caesarsgrunt yep: if you`re sure you`ll provide the file and the hash, it`s less likely that someone will come with a second file that`ll have the same hash. But, given that other hash functions exist, and aren`t broken yet, it`s advised to move to another one, when possible... :) – woliveirajr Jul 12 '13 at 03:21