26

I was reading this article about MD5 hash collisions in which it clearly states that these two strings (differences marked with ^):

d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70

d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
                                      ^                                                                                                                               ^                                                                               ^

have the same MD5 hash. Although testing this hypothesis with this MD5 generator, they do not have the same hash.

The first string hashes to edde4181249fea68547c2fd0edd2e22f and meanwhile the second to e234dbc6aa0932d9dd5facd53ba0372a which is not the same.

Why is it being said that these two strings produce the same MD5 hash value?

CodesInChaos
  • 11,854
  • 2
  • 40
  • 50

3 Answers3

107

... in which it clearly states that these two strings ...

No. It clearly states "... two different sequences of 128 bytes ...".

There is a huge difference in these statements. In the first the strings are taken as they are. In the second one will hopefully realize that these are 256 character long strings which consist of hexadecimal characters and that one needs to convert these to binary to get the 128 bytes.

Once this conversion is done and the MD5 is computed from the actual 128 bytes one will see that both byte sequences result in the same MD5, namely 79054025255fb1a26e4bc422aef54eb4 (matching the article).

This can for example be reproduced by using this site and choosing bytes in format hexadecimal as input.

dave_thompson_085
  • 9,759
  • 1
  • 24
  • 28
Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
39

As pointed out in the other answers, the hexadecimal strings must be decoded to raw bytes first, then the raw bytes should be fed into the MD5 hash function. If you do this, both produce a hash of 79054025255fb1a26e4bc422aef54eb4.

This can be done easily on the command line:

echo -n 'd131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70' | xxd -r -p | md5sum

produces

79054025255fb1a26e4bc422aef54eb4

echo -n 'd131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70' | xxd -r -p | md5sum

produces

79054025255fb1a26e4bc422aef54eb4
mti2935
  • 19,868
  • 2
  • 45
  • 64
20

Those are byte values not strings.

Use https://cryptii.com/pipes/md5-hash and change input to bytes.

Both byte arrays produce equal hash 79054025255fb1a26e4bc422aef54eb4.

roscoe
  • 311
  • 1
  • 8
  • 4
    Given the evidence, it is possible that this was either a duplicate or a novel answer that happened to use the same resources. Since roscoe provided support for why both answerers would have used the same resource, we can assume the latter. – schroeder May 28 '21 at 20:21