How can the content of a file refer to its own MD5?

Question

So I just saw this picture on Imgur: http://imgur.com/gxRCrCM

The intriguing thing about it is that the picture refers to an old Daft Punk song named "Face 2 Face". The image's MD5 is 6b0cc07a5c4d3d8fface2face79d8205 which, amazingly enough, contains the phrase face2face in it.

How does one go about generating this type of hash? I always thought that one gets a totally different hash when even one byte of the message is modified.

What kind of computing power is required to perform this trick? Of course, I am assuming this is not a mere coincidence.

Also I'd love to know if there are other examples of such hashes, and what are some tools available for Linux or Windows?

(1) First published source I could find for this image is [@AngeAlbertini on Twitter](https://twitter.com/angealbertini/status/648173774215507968). (2) For those who don't want to hash themselves: Image uploaded to [VirusTotal show MD5Sum in "Additional Information" tab](https://www.virustotal.com/en/file/fc1a243028b6a3a6b94e9104a3d0167269da69ababe47c352ed36eb025baef03/analysis/) — StackzOfZtuff, Sep 27 '15 at 18:00
"I always thought that one gets a totally different hash when even one byte of the message is modified" - basycally, thats the idea: you change one bit of message and get totally new hash until it meets your requirements. — el.pescado - нет войне, Sep 27 '15 at 19:55
Just noticed: VirusTotal also says: "Warning: Corrupted PNG image". So I guess whoever generated that may have taken some short cuts. — StackzOfZtuff, Sep 28 '15 at 05:06
I found an earlier published source on Twitter: [@Hexatomium, 9:25 AM - 27 Sep 2015](https://twitter.com/Hexatomium/status/648171613041655808) — StackzOfZtuff, Sep 28 '15 at 09:30

score 29 · Accepted Answer · edited Mar 17 '17 at 13:14

29

'face2face' is only 9 characters, i.e. 36 bits since we are using hexadecimal encoding. It suffices to generate many pictures with some internal variations (subtle variations that do not impact the graphical output) and hash them all until the target string is obtained. Since we are looking for a 36-bit pattern and accept that pattern wherever it appears in the 32-character output (24 possible positions), then the average number of pictures to produce and hash will be about 2³⁶/24, i.e. about 2.8 billion. Since a basic desktop PC can compute several (many) million MD5 hashes per second, this should be done in less than an hour with some decently optimized code.

This has nothing to do with known weaknesses of MD5 with regards to collisions. The same could be done with SHA-1 or SHA-256.

This has already been discussed in this question.

edited Mar 17 '17 at 13:14

Community

1

answered Sep 27 '15 at 18:46

Thomas Pornin

320,799
57
780
949

3

[gitbrute](https://github.com/bradfitz/gitbrute) is a tool that applies the same principle to git commits - using SHA1. – mgarciaisaia Sep 28 '15 at 02:34
RE: "Since a basic desktop PC can compute several (many) millions of MD5 hashes per second, this should be done in less than an hour with some decently optimized code." Wouldn't that also depend on the size of what is being hashed? My PC certainly can't hash millions of MP3s per second. :) – John Blatz Sep 28 '15 at 08:54
7

@JohnBlatz: Yes, in principle, though this too can be optimised - e.g. if you can arrange that the variations in the file are at the end, then you can simply cache the result of hashing the first (and largest) part of the file, and only have to re-do the very last part for each variation. – psmears Sep 28 '15 at 10:23
1

Your computation is seriously flawed. Imagine we produce hashes of length 3, with three possible states for each position ("tribit") a, b, c (so possibles hashes are aaa, aab, aac...). Imagine you want to produce a hash containing the substring `c`. Your reasoning becomes "the substring `c` is one character long, i.e. one "tribit", so it suffices to generate three hashes; and since there are three possible positions for `c` to be, it actually is sufficient to produce one". But obviously the hash `aaa` doesn't contain the substring `c`... Try more small examples and you'll see your error. – N.I. Sep 28 '15 at 12:27
3

@user10014: you missed the important word, which is "average"; and the other important word, which is "about". It is an approximation which is valid for huge numbers, as is the case here. The probability of a 128-bit random value to contain a specific 36-bit pattern in one of 24 positions is close to 24\*2^(-36), but not _exactly_ that value; it in fact depends on the pattern. If you use a smaller pattern, then the approximation becomes looser. – Thomas Pornin Sep 28 '15 at 13:17

score -2 · Answer 2 · answered Jun 23 '16 at 19:59

-2

Everything is very simple. The name was coined for the song, already after the computed hash. They probably could not think of a name for a song and someone accidentally measured hash and noticed there is an unusual sequence, due to this, the name was invented! =)

answered Jun 23 '16 at 19:59

Shumoff

1

1

Do you have a reference for this, or are you guessing? – schroeder Jun 23 '16 at 20:57
Shumoff - have you looked at the image? – Rory Alsop Jun 23 '16 at 21:28
1

And what I mean by that is - the image has the song name in it. So no, the name was not coined after the hash was created. The image was altered until a hash with the correct content appeared. – Rory Alsop Jun 23 '16 at 22:23

How can the content of a file refer to its own MD5?

2 Answers2