1

I'd like to learn a bit more about MD5 collisions. So let's assume I have a message m:

m = somesecretmessage

And I hash that message:

z = md5(m)

The only know information is z. I do not know m. How would I be able to generate a file/string that would result in the same md5 hash as z? Also would it be easier if the length of m was known?

After some research it seems finding a collision is very feasible (aka in a couple of minutes/seconds). How would I create such a collision - a file/word that gives me the same hash? What tools can I use to achieve that?

Kyu96
  • 165
  • 1
  • 7
  • Please cite where you found the claim that finding a collision is very feasible and takes only minutes – Josef Sep 23 '19 at 11:56

1 Answers1

5

This is not a collision attack, but a preimage attack

With a collision attack, the attacker has control over both inputs to the hash function, say x and y, and they want to find x and y such that x ≠ y but h(x) = h(y).

With a first preimage attack, the attacker knows h(x) but not x, and they want to find y such that h(y) = h(x). Importantly, the attacker cannot change x.

If the message were known, it would be a second preimage attack, where the attacker knows x (and therefore also knows h(x)), and wants to find y where y ≠ x but h(y) = h(x). Again, the attacker cannot change x.

MD5 is still secure against both first and second preimage attacks, so what you want to do is not possible, and knowing the length of the message (or even the entire message) will not help.

AndrolGenhald
  • 15,436
  • 5
  • 45
  • 50
  • I could have sworn I once saw a tool that would build a preimage attack for files in particular formats... – Conor Mancone Sep 23 '19 at 17:48
  • Although at the moment all I can find is this... https://www.mscs.dal.ca/~selinger/md5collision/ which doesn't really seem like it meets the OPs desires... – Conor Mancone Sep 23 '19 at 17:49
  • @ConorMancone Many collision attacks allow partial arbitrary control over the input, the SHAttered attack used [PDFs](https://shattered.io/static/pdf_format.png), but you still have to be able to change both inputs. – AndrolGenhald Sep 23 '19 at 18:00
  • @AndrolGenhald Thanks for the insight. Which hashfunctions are vulnerable against such preimage attack? I could imagine multiplicative hashes such as the Knuth hash? Are there more? – Kyu96 Sep 24 '19 at 09:49
  • @Kyu96 [This](https://crypto.stackexchange.com/q/67059/54167) may be relevant. Even MD2 is only just at the edge of having a feasible preimage attack. You can consider anything with an output of 64 bits or less as having a feasible preimage attack, as you can simply brute force it, though it may still be a bit expensive. – AndrolGenhald Sep 24 '19 at 10:14