How safe is md5sum in regards to verifying executable files?

Question

is it possible to trick someone into running a malicious executable file instead of a real one provided by a website; assuming they will md5sum the file

Possible? Yes. Reasonable to fabricate an executable that has the same MD5 checksum as a specific, good version? Not really. (I'll leave it to people who want to answer this to go into it in depth.) Reasonable that, if you can replace an executable, that you can also replace the posted checksums? Yes. — Ghedipunk, Oct 23 '19 at 20:31
md5sum is to check if the download is not corrupted, not if it is malicious or secure https://superuser.com/a/849857/97877 — LLub, Oct 23 '19 at 20:34
@Refineo Oh I see, in that case does it confirm the server the file came from or just if the download was corrupted or not? — Tino Uchiha, Oct 23 '19 at 20:39
md5sum only confirms that what you downloaded was not corrupted in-transit while being downloaded. It doesn't confirm if the server is trustworthy or malicious, it doesn't say anything about trustworthiness of the downloaded file. The downloaded file still can potentially be received from a malicious server (MITM man in the middle attack), — LLub, Oct 23 '19 at 20:44
Are you implying an attack where the attacker replaces the 'real' executable file with a malicious executable file, where the malicious file has the same MD5 hash as the 'real' file? If so, this is known as a 'preimage' attack. MD5 has been known to be vulnerable to preimage attacks for many years, but it would take some effort on the part of the attacker for the executable to have the desired functionality *and* to have the required MD5 hash. See https://security.stackexchange.com/questions/186657/is-it-secure-to-use-md5-to-verify-the-integrity-of-small-files-less-than-15kb for more info. — mti2935, Oct 23 '19 at 20:58
@mti2935 "MD5 has been known to be vulnerable to preimage attacks for many years" It has? Are you sure you're not thinking of collisions? — Joseph Sible-Reinstate Monica, Oct 24 '19 at 03:11
Joseph Sible, Thanks for the correction. Yes, you're correct, MD5 is vulnerable to collisions not preimage attacks. Preimage attacks are a higher bar for attackers. The answer below by Thomas applies. Notwithstanding, a more modern hashing algorithm such as SHA256 would be preferable. — mti2935, Oct 24 '19 at 14:08

Thomas · Accepted Answer · 2019-10-24T07:43:10.183

The comments cover an important aspect about MD5 collisions, but I'll give a little more food for thought (and the math to back it up) as I think there is a little misunderstanding in this area, especially around pre-image attacks.

is it possible to trick someone into running a malicious executable file instead of a real one provided by a website; assuming they will md5sum the file

Yes-ish. Here are two methods of vastly different feasibilities.

Feasible: Compromise website security. Once done, upload the new executable and modify the posted hash on the website. The feasibility is dependent on the security of the website.
Intractable: Send target a malicious executable that hashes to the same posted hash. Take a malicious executable, then start padding the executable with data, then hash the result until the MD5 hash of the malicious executable is the same as the legitimate executable. Note, MD5 has found to not be collision resistant, but it is still fairly 1st and 2nd preimage resistant (forest had a great explanation on this here).

Assumptions for 2:

The fastest 2st preimage attack on MD5 whittles down the number of combinations to 2^123.4 (according to this paper). This means that if I compute 9.7... × 10^36 hashes, I will have a 50% chance of finding the desired output hash (see math 1).
We are hashing the smallest possible malicious program at 13 bytes (unlikely and very conservative) and I'm assuming my padding will be 16 bytes (this gives me enough room to combine all possible combinations) for a total hashing of 29 bytes for each program.
An 8 Nvidia 1080 GPU setup can hash 2x10^11 "29 byte files" per second (a little aggressive).

This means using an 8 Nvidia 1080 GPU setup, it'll take 1.5x10^26 or 150000000000000000000000000 years to have a 50% chance of developing a malicious program that hashes to the desired target hash (under the current assumptions - see math 2).

Hopefully this gives you a sense of the numbers that it would take to "spoof" an MD5 checksum. And yes, the numbers are even more ridiculous with SHA256.

Reference:

Math 1 (solve for x in 1 - ((2^123.4 - 1)/2^123.4)^x = 0.5. The x is the number of programs you need to hash to have a 50% chance of hashing to a desired target hash):

Math 2 (by units I mean 29 byte files or the number of hashes that must be performed):

I would strongly urge you _not_ to tout MD5's (second-)preimage resistance for this application. Whether you are relying on preimage resistance or relying on collision resistance—or, more to the point, whether you are _vulnerable to_ collision attacks, or only vulnerable to preimage attacks—can be extremely tricky for non-cryptographers assess. For example, perhaps the executable has an npm package baked into it—an npm package which is under the adversary's control, in which case you _may be_ vulnerable to collisions. See https://crypto.stackexchange.com/a/70049 for some more details. — Squeamish Ossifrage, Nov 10 '19 at 03:40

ig-dev · Answer 2 · 2019-10-24T08:18:36.277

If the hashsum that the user compares against is coming from a trusted, non-malicious file, then no, it is not possible.

People who say that theoretically, a manufactured hashsum collision is possible, forget that we don't need to have "any" hashsum collision (like with passwords), but that our hashsum collision needs to be a viable executable first, and secondly needs to do exactly what we want. At that point we are talking about improbabilities where it is safe to say, no, that is not possible.

Nevertheless, if the attacker is able to compromise the file being downloaded, it is likely that he can also compromise the hash-sum that the user receives. In that case, the hash-sum still fulfills it's purpose to confirm that what has been downloaded was not corrupted in transportation. Only, that it is now the attacker who is certifying this, not the (presumably) trusted website.

There are also other attacks against MD5, but all of those presume that the attacker has some form of access to the file in the first place, before the first hashsum is generated. (E.g. by preparing a benign file for a later attack)

So if the hashsum is coming from an untampered non-malicious file then it's safe, but in all other cases the hashsum merely confirms that you received what either the website, or an attacker, intended you to receive.

How safe is md5sum in regards to verifying executable files?

2 Answers2