7

Sometimes MD5 is used to validate that a downloaded file is really valid.

So I want to know if is possible for a hacker to modify a file and introduce some malicious code AND make that file generate the original MD5.

Example

Original program

MD5:

eac2a0844b652ecea010ec38960d18ba

Malicious code

Original program
Malicious Code

MD5:

5c07d676b765510db628978dc593aa0d

Malicious code + random bits to modify the MD5

Original program
Malicious Code
00000000000000000000000000000000

MD5:

0ade6514efd2d247105ba6249e31ae47

Malicious code + random bits to modify the MD5

Original program
Malicious Code
00000000000000000000000000000001

MD5:

1a499c7ad2755cd66eeea78f5b56f6d0

... several combinations later ...


Malicious code + correct bits to modify the MD5

Original program
Malicious Code
d1bf573000019911b85cbeb24503e745

MD5:

eac2a0844b652ecea010ec38960d18ba //Just an example, real MD5: 882789190dcfee14d563913d345054e0

With enough time, could a malicious user find a string that generates the original MD5?

IAmJulianAcosta
  • 2,445
  • 2
  • 14
  • 18
  • 1
    This is called a 2nd preimage attack and should be impossible for MD5. – SEJPM Feb 29 '16 at 23:23
  • 1
    [An answer](http://crypto.stackexchange.com/questions/3441/is-a-second-preimage-attack-on-md5-feasible) to follow up to what @SEJPM said. – cremefraiche Feb 29 '16 at 23:26
  • @cremefraiche I understand what you're saying now. Even though collisions may theoretically be possible, it's not necessarily possible to force a collision for a specific hash value (i.e. preimage attack). – user1751825 Mar 02 '16 at 00:40
  • @user1751825 Almost got the wording down! Change "for a specific *hash* value" to "for a specific *plaintext* value" and that is exactly a preimage attack. – cremefraiche Mar 02 '16 at 05:08
  • Related: [Is MD5 considered insecure?](https://security.stackexchange.com/q/19906/32746) – WhiteWinterWolf Jul 25 '17 at 23:21

1 Answers1

10

Sort of. But it would take a very long time.

Being able to create a file which has a specific known hash is known as a pre-image attack. An example would be me asking for a file which hashes to beefbeefbeefbeefbeefbeefbeefbeef. The only way to do this, theoretically, is to try calculating hashes for every possible input to MD5 until you find an input which gives this hash. There is no guarantee that this input exists (although it's likely to) and it's entirely possible that finding this would take universe scale time, even with the speed of MD5 hashing on modern processors. There is a known attack which weakens the MD5 pre-image resistance from 2128 to 2123.4. That's going from 1022 millenia, with one hash per microsecond, to 1020 millenia, so still not particularly practical!

Creating another file with the same hash as an existing one is known as a second pre-image attack. This is a special case of a hash collision - normally, you just want to find two strings which have the same output hash. There are a few of these known, and you can even generate your own!. However, in your case, you are leaving the first input untouched, and looking for a second input which both contains the first input, and includes some other specific content. That massively reduces the potential set of inputs to the hash function, meaning that you're looking for something that might not exist at all within a subset of the inputs. And the only way to find it is to try hashing every single valid value, until you find one.

In theory, you might be able to find such an input, but it may not exist at all, and finding it might take longer than the universe has existed. It's probably not worth worrying about!

Incidentally, this is why MD5 is fine for use as a checksum for downloads or similar - being able to generate a distinct file with the same checksum as a fixed original would be a computational coup, and be a bit wasted if merely used for tampering with downloads. It's usually a lot easier just to modify the MD5 strings listed for the downloads.

Matthew
  • 27,233
  • 7
  • 87
  • 101