-4

Yes, I know MD5 is weak and should not be used. To make a prove of concept I need two strings with the same MD5 value but all I can find is binary. Like this nice example. Works fine as binary but fails as string:

MD5("d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70") => edde4181249fea68547c2fd0edd2e22f

MD5("d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70") => e234dbc6aa0932d9dd5facd53ba0372a

But for my application binary data does not work. So does someone has two strings that create a collision?

Alex
  • 1,207
  • 1
  • 10
  • 9
PiTheNumber
  • 5,394
  • 4
  • 19
  • 36
  • 4
    And which encoding do you want? ASCII, ANSI(your example probably is valid ANSI), UTF-8, UCS-2,...? There is no concept of MD5 of a string. – CodesInChaos Aug 28 '12 at 09:40
  • 2
    Thats odd because the md5 functions I usually use expect a [string as parameter](http://www.php.net/manual/en/function.md5.php). If there is no concept of MD5 of a string I wonder how they to this magic. – PiTheNumber Aug 28 '12 at 11:08
  • 1
    They use some kind of encoding to transform the string into bytes before hashing. But there are many different encodings. – CodesInChaos Aug 28 '12 at 11:09
  • Looks like the [MD5 RFC](http://www.faqs.org/rfcs/rfc1321.html) includes a MDString() function, too. – PiTheNumber Aug 28 '12 at 11:13
  • 1
    Functions that hash a string implicitly assume some encoding, such as ASCII, but that encoding might not be same same for other implementation. Especially if you work with characters that are not part of ASCII. | Php is simply using the string type for two purposes: ANSI string and sequence of bytes. MD5 uses it with the latter meaning. – CodesInChaos Aug 28 '12 at 11:18
  • 1
    @PiTheNumber - Please read [The absolute minimum every developer needs to know about unicode](http://www.joelonsoftware.com/articles/Unicode.html). Hashes work on binary data. A string like "hello" can be represented in ASCII as the following 5 bytes (represented in hex) `0x48 0x65 0x6c 0x6c 0x6f` or in decimal as `72 101 108 108 111`. A byte is 8 bits; a num between 0 and 255 (`0xff`); e.g., 0x48 (dec 72) is `01001000`). Other encodings exist: in UTF-16 every character is two bytes, so 'H' is `00 48` (or `48 00` depending on endianess) so you can specify fancy symbols like ʖƕڒҖĩ. – dr jimbob Aug 28 '12 at 14:17
  • @drjimbob: Thanks, I do know that. I think UTF8 is default for most applications. But if you have a string that works as ASCII it's fine for me, too ;) – PiTheNumber Aug 28 '12 at 15:18

1 Answers1

4

Your problem is that you treated the hex string as a sequence of ANSI(or something similar) characters. But you need to transform it, so that two hex characters get treated as one byte.

In php this becomes:

echo MD5(hex2bin("d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70"));
echo "<br/>\n";
echo MD5(hex2bin("d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70"));

Or for older PHP versions (PHP < 5.4.0):

echo md5(pack("H*", "d131d..."));
PiTheNumber
  • 5,394
  • 4
  • 19
  • 36
CodesInChaos
  • 11,854
  • 2
  • 40
  • 50