13

General

Always check the MD5 hashes of the .NET Framework assemblies to prevent the possibility of rootkits in the framework. Altered assemblies are possible and simple to produce. Checking the MD5 hashes will prevent using altered assemblies on a server or client machine.

Source: https://www.owasp.org/index.php/.NET_Security_Cheat_Sheet

Isn't MD5 completely broken for this purpose?

H M
  • 2,897
  • 6
  • 22
  • 21

4 Answers4

14

The MD5 hash algorithm has been demonstrated to be weak to collision attacks. This means that an attacker can generate two files which will produce the same hash value. This has no bearing on file integrity checks.

To create a file that matches a previously known hash, the algorithm has to be weak against second preimage attacks. While MD5 has some theoretical weaknesses in this aspect, the current attacks are still not computationally feasible.

Of course, new attacks might surface in the future. MD5 has been demonstrated to have several glaring weaknesses. Try to use a hash function like SHA256 instead.

  • 1
    If we're looking for a way to prove integrity in secure way, a better way is a PGP signature. – Polynomial Apr 17 '13 at 14:33
  • 1
    Technically a _second preimage_ attack, since the attacker also has the genuine file to start with. This is not exactly the same thing as a _preimage attack_ (but close enough). – Tom Leek Apr 17 '13 at 14:39
  • @TomLeek Roger, made the edits. –  Apr 17 '13 at 14:42
  • @ Polynomial that proves integrity and authenticity. it's preferable if both are a goal. – Nick P Apr 18 '13 at 00:19
  • "If we're looking for a way to prove integrity in secure way, a better way is a PGP signature" -- how is that better? You've just shifted the problem to verifying the integrity of the public key! – TheGreatContini Jun 30 '15 at 00:46
  • 1
    @TheGreatContini Nothing wrong with "shifting" the problem to verifying the key. That's a much simpler problem to solve than verifying the data without some kind of cryptography. Pre-shared PGP keys, trusted third-party key servers, keys signed by trusted third-parties, WOT (web of trust), etc, etc. Besides, what's the alternative? – Isaac Freeman Aug 07 '18 at 15:05
8

To complete @Terry's answer: MD5 is thoroughly broken for collisions, but only very slightly weakened for preimages and second preimages. Best known attack has cost 2123.4 (see the article), which is stupendously infeasible with existing technology, but, from an academic point of view, somewhat better than the expected 2127 resistance that a perfect hash function with a 128-bit output should offer.

SHA-256 is the current "default hash function" which you should use for anything which requires a hash function, unless some specific context characteristics warrant another function. However, replacing MD5 for integrity checks is not a critical emergency; no need to get all worried on it.

While MD5 is still fine for the purpose of integrity check, you must realize that this only translates the issue: you still have to make sure that you use the correct hash value. For instance, make sure that you get the hash value from an HTTPS Web site (from a reputable server). Hash values are small enough to allow for some extra mechanisms: you can write them down on paper or dictate them by phone, for instance, which you could not do with a 3 GB archive.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
2

The other two answers are right about MD5 being safe for file integrity. The point I diverge on is that you shouldn't necessarily use SHA-256 by default. Crypto choice is about tradeoffs. After integrity, performance is my biggest concern with hash functions for checking files. I've seen MD5 hash four times faster than SHA-256. A list of resulting hashes also takes up half the space with MD5, which might help in memory limited systems.

So, MD5 is secure for this area of application and is anywhere from a little to several times faster. So, I'd use it.

Note: I have substituted HAVAL for MD5 in the past b/c it's fast, too. SHA-3 competition is also done so we have more to profile for performance and maybe replace MD5 in the near future for high performance hashing. Also, the VIA Padlock Engine accelerates SHA-256 so I use it on such a platform. Lots of things to consider, but I always say focus on endpoint, network and app security b/c crypto is usually the strongest link.

Nick P
  • 667
  • 4
  • 4
  • can't we concatenate SHA-256's output to 128 bits if needed? – H M Apr 19 '13 at 05:43
  • We can. However, this could affect the security properties of the resultant hash. Uniqueness would be my main concern. Cryptographers in the past have warned me against customizing crypto algorithms and being clever with them. So, the non-clever solution was using a compact, superfast, well-understood algorithm that will survive *this* use case, if not others. – Nick P Apr 19 '13 at 07:17
  • don't u think truncating SHA-256 output is nevertheless more secure than using an older and broken algorithm? – H M Apr 24 '13 at 20:03
  • 3
    Short answer: no. Long answer: we can't believe any construction is secure unless its been analysed and proven over time. A specific, truncated SHA-256 output? Hopefully secure, but no proof. MD5 for 2nd preimage attacks? Decent in theory and proven in 20+ years field use. MD5's speed and digest size trumps SHA256. I mean, an opponent good enough to second preimage MD5 in coming years could penetrate your system in dozens of easier ways. That's my whole point. Certain crypto issues get more focus than they deserve. – Nick P Apr 28 '13 at 05:19
  • Maybe MD5(SHA256(data)) is a good compromise between security and storage space; but of course has no performance benefits (somewhat slower than SHA256). – H M Apr 28 '13 at 05:35
  • 1
    The biggest bottleneck to running checksums on large files is going to be filesystem IO. The hashing algorithm is going to be blocking on IO the vast majority of time, so the speed of the check shouldn't make a significant difference. I just ran md5sum on a large ISO image that I know hasn't been read from disk since I last rebooted. the first time it took 4.4 seconds and the second time it took 2.3 seconds thanks to filesystem caching, meaning md5sum is spending approximately 50% of the time waiting on disk IO. – Isaac Freeman Aug 07 '18 at 15:34
  • There isn't any significant issue with [truncating SHA-256](https://security.stackexchange.com/a/34797) (besides the shorter size of course). – AndrolGenhald Mar 13 '19 at 13:11
-3

If you want to verify the integrity of the file (contents not tampered with), and are storing the hash somewhere, then just generate the hash in pairs like an md5 and a sha256 and when verifying compare both. Finding collision for both the hashes simultaneously for the same file is entropically impossible for ...

    your applications lifetime on earth
 or any applications lifetime on earth
 or any lifetime on earth
 or lifetime of earth

...you get the drift.

We are talking about a file here and someone trying to make the tampered file have the same hash as the original one. With any current/future tech, They might be able to even manufacture a file that has the same hash (even sha384) as the original one, but having that same file to also have the same md5 as the original one is a long shot. So multiple checks > single check.

schroeder
  • 123,438
  • 55
  • 284
  • 319
  • Or you could just use sha256 by itself. While it would seem to make sense, it's [not actually clear](https://crypto.stackexchange.com/a/1172) that [multiple hashes](https://crypto.stackexchange.com/a/272) would help much against a collision attack anyway. – AndrolGenhald Mar 13 '19 at 13:15
  • Please do not edit your answer to address specific people in the comments. Use the comment feature. – schroeder Mar 13 '19 at 15:02
  • 1
    @MohdAbdulMujib You seem to be under the impression that if hash A requires `2^a` time to create a collision, and hash B requires `2^b` time to create a collision, then concatenating A and B would result in a hash requiring `2^(a+b)` time to create a collision. The links I gave provide evidence that this is not the case, and it's actually not a whole lot better than `max(2^a, 2^b)`. – AndrolGenhald Mar 13 '19 at 15:21
  • Of course, I'm only talking about collisions. You mention collisions in your answer, but what you're actually talking about is a preimage attack, which is entirely separate. – AndrolGenhald Mar 13 '19 at 15:22
  • 1
    The problem with this answer is that it does not answer the question. You assume that MD5 is broken, and propose a mitigation. You do not actually answer the question. The original comment, then, makes sense in the light of the question. If MD5 is broken, as you appear to concede, then why not use a non-broken hash instead? – schroeder Mar 13 '19 at 15:45
  • The thing is neither is "broken" for this specific purpose that OP asked, but If someone is hellbent on creating a file with an identical hash it is pretty much possible and as the file length or content is no bar, hence my answer. on a side note, Heres an answer with a similar thought process. https://stackoverflow.com/a/622949/807104 – Mohd Abdul Mujib Mar 13 '19 at 16:00
  • 1
    I think the most important thing here (as schroeder said) is that you aren't answering the question that OP asked. Instead you're offering a not-particularly-great solution to a problem that doesn't exist. OP was asking for validation that MD5 was unsuitable for file integrity checking, not advice on alternative hashing methods that could be used. This validation has already been provided by a number of answers. As such I don't think answer brings any new useful information in the context of the original question. – Polynomial Mar 14 '19 at 00:46