58

I was just reading about SSL/TLS stuff, and according to this site (which is rated as A by Qualys SSL Labs), MD5 is totally broken, and SHA-1 is cryptographically weak since 2005. And yet, I noticed that a lot of programmers and even Microsoft only give us SHA-1/MD5 to check the integrity of files...

As far I know, if I change one bit of a file, their MD5/SHA-1 will change so why/how they are broken? In which situations can I still trust checksums made with SHA-1/MD5? What about SSL certificates that still use SHA-1 like google.com?

I am interested in applications of MD5 and SHA-1 for checksums and for certificate validation. I am not asking about password hashing, which has been treated in this question.

Freedo
  • 2,253
  • 5
  • 18
  • 28
  • 4
    possible duplicate of [Why do people still use/recommend MD5 if it is cracked since 1996?](http://security.stackexchange.com/questions/15790/why-do-people-still-use-recommend-md5-if-it-is-cracked-since-1996) – thexacre May 03 '15 at 03:42
  • 4
    I don't think its a duplicate because i read it and it only explains why MD5 should not be used to store passwords, but my mainly concern here is about checksums and certificates(even reputable companies are still using sha-1 on their certificates) i have only included MD5 on my question because is still the most used hash to check for integrity of files – Freedo May 03 '15 at 03:55
  • 1
    Most of the time, the choice of hash algorithm is arbitrary... there's not much reason *to* use MD5 or SHA-1 for new systems, but there's not much reason to justify spending effort to migrate existing things away from them either (unless they rely on collision resistance). – user253751 May 03 '15 at 08:22
  • And there's no reason to any programmer or company to give users MD5/SHA-1 for checksums even if it is "secure enough" when it's so easy and fast to generate a SHA-512 of a file...most of end users do not think like developers in the sense of storing passwords and stuff like that, they just use hash to check their files and telling them that MD5/SHA-1 is insecure and at same time provide this for them surely can confuse people – Freedo May 03 '15 at 08:46
  • @Freedom I am rescoping the question to certificates only (or certificates and checksums)? This'll help avoiding duplicates and making search engines point to the most relevant question between this one and the one pointed out above in the future. Please edit back if you think I misinterpreted your original question. – Steve Dodier-Lazaro May 03 '15 at 12:31
  • @SteveDL no its good in fact you make me feel like i'm still a long way to write as native speaker of English do lol – Freedo May 04 '15 at 05:26

4 Answers4

38

SHA-1 and MD5 are broken in the sense that they are vulnerable to collision attacks. That is, it has become (or, for SHA-1, will soon become) realistic to find two strings that have the same hash.

As explained here, collision attacks do not directly affect passwords or file integrity because those fall under the preimage and second preimage case, respectively.

However, MD5 and SHA-1 are still less computationally expensive. Passwords hashed with these algorithms are easier to crack than the stronger algorithms that currently exist. Although not specifically broken, using stronger algorithms is advisable.

In the case of certificates, signatures state that a hash of a particular certificate is valid for a particular website. But, if you can craft a second certificate with that hash, you can impersonate other websites. In the case of MD5, this has already happened, and browsers will be phasing out SHA-1 soon as a preventative measure (source).

File integrity checking is often intended to ensure that a file was downloaded correctly. But, if it is being used to verify that the file was not maliciously tampered with, you should consider an algorithm that is more resilient to collisions (see also: chosen-prefix attacks).

Austin Hartzheim
  • 1,581
  • 11
  • 15
  • 1
    Note that the attack on certificates is collision-based: the attacker creates a pair of certificates for different websites, gets one signed, then transfers the signature to the other certificate. – Mark May 03 '15 at 05:01
  • @Mark Thanks. I have updated the answer to mention certificates rather than keys. – Austin Hartzheim May 03 '15 at 06:26
  • 3
    I would count file integrity as "second preimage" when an attacker is unrelated to the original author, and "collision" when the original author is collaborating. – Paŭlo Ebermann May 03 '15 at 06:56
  • @Austin File integrity checking only needs *weak* collision resistance (i.e. second-preimage resistance, the terms are synonyms) unless the author is collaborating or the attacker has influence over the legitimate file. MD5 has this property; chosen-prefix collisions don't break second-preimage resistance. It's not a bad idea to use SHA-2, but MD5 is perfectly sufficient. – cpast May 04 '15 at 05:42
  • @Freedom ...which is why you should use signatures to verify file integrity (while that still requires a trusted public key, it's much easier to trust something that stays the same for years across all files produced by the developer). – cpast May 04 '15 at 05:45
  • @Freedom, it'd be *extremely* difficult to create any kind of malicious change to a file that has the same checksum. With collisions such as those that MD5 has, it's possible to make some kind of change, but a malicious change would be far, far harder. It's not impossible, but I've never heard of it being done and it seems like more trouble than it's worth. – Kat May 07 '15 at 17:53
  • For checking for simple file corruption, a good CRC is often even better (not to mention faster) because it can guarantee detection of a certain number of erroneous bits. – forest Apr 20 '18 at 02:33
15

For MD5, no one who is both reputable and competent is using it in a context where collision-resistance is important. For SHA-1, it's being phased out; the SHA-1 break was not practical when it was released, and only now is it becoming important to think about phasing it out where collision-resistance is needed. In fact, it is being phased out; for instance, long-term TLS certificates with SHA-1 no longer work in Chrome, to prod people into changing to SHA-2. However, it's not practically broken yet, so it's acceptable for now.

The reason why it wasn't dropped for everything immediately is because security involves tradeoffs. You don't drop a major standard and make everything incompatible with a giant install base on the grounds of something that might lead to practical attacks in a decade's time. Compatibility matters.

Also, for many uses, MD5 and SHA-1 aren't cracked at all. They both have weaknesses against collision-resistance, meaning an attacker can create two messages that hash to the same thing. Neither is broken against preimage resistance (given a hash, find something that makes that hash), or against second-preimage resistance (given a message, find a different message with the same hash), or (their compression functions) as pseudo-random functions. That means that constructions like HMAC-MD5 can still be secure, because it doesn't rely on the property of MD5 that's broken. Less than ideal, sure, but see "compatibility matters if it's still secure" above.

File integrity checking via hashes is almost always pointless anyway; unless the hashes are sent over a more secure channel than the file, you can tamper with the hashes as easily as with the file. However, if the hashes are sent more securely than the file, MD5 and SHA-1 are still capable of protecting file integrity. Because the attacker doesn't have any influence over the legitimate files (and there needs to be zero influence to really be safe), creating a new file with the same hash requires breaking second preimage-resistance, which no one has done for MD5 or SHA-1.

Note the difference between integrity checking and certificates. Certificates are issued by a CA from a user-created CSR; the attacker can have huge influence over the actual certificate contents, so a collision attack allows an attacker to create a legit and a fake certificate that collide, get the legit one issued, and use the signature on the fake one. In contrast, in file integrity the attacker normally has zero control over the legitimate file, so needs to get a collision with a given file, which is much harder (and which as far as we know can't be done with MD5).

cpast
  • 7,223
  • 1
  • 29
  • 35
  • 1
    *Because the attacker doesn't have any influence over the legitimate files* Sometimes an attacker does, though. The file could be an automatically-generated aggregate of blog postings relating to a given topic, and the attacker could publish a relevant-looking blog article as a vehicle for an attack. But otherwise your points are dead-on. – Atsby May 03 '15 at 08:30
  • Yes it's nonsensical to distribute files and checksums over non secure connections. Then, the only time that works would be if everyone was able to corroborate each other's checksums. – munchkin May 03 '15 at 08:49
  • 4
    @munchkin I think you're being too harsh. I often used to use MD5 checksums to check for accidental corruption, and in a very few cases, I did find accidental corruption. I also use them for talking about the identity/version of a file when someone has releases multiple files that don't match but have the same version number. (Yes, it's a bad idea, but sometimes people do it) – Patrick M May 03 '15 at 15:07
  • I never did a MD5/SHA-1 checksum to check for accidental corruption and it really concerns me that developers and companies still give me this hashes now i will demand for stronger hashes if even myself can generate a SHA-512 fast and easy from a file so why can't them? SHA-1 is twenty years old now and if spy agencies focused on produce Windows & Linux iso images with backdoor and same hash and flood on the internet you can bet this would be a high priority task and feasible at 2015 i don't think that internet community can afford to trade security over usability after NSA scandal – Freedo May 04 '15 at 05:06
  • @Freedom MD5 is **not vulnerable** for most file-integrity needs; unless an attacker has some control over the legitimate file, no attacker is going to be able to make a different file that collides with the legitimate one. On the other hand, unless you get the hash from a different source than you get the file, any hath at all is utterly worthless; a hash transmitted over an insecure channel **cannot** provide any integrity check. – cpast May 04 '15 at 05:34
  • @cpast you are forgetting that so many sites serve downloads via http even reputable companies still not give their users download via https (even bitdefender.com don't) so you can say yes that if a powerful corporation wants they can intercept a lot of downloads via http and nobody will know even if they checked using MD5 received by a more secure channel...i can even think of a malware that will intercept download attempts of know av files and render them useless to not be detected and much more...if i can download the file so can NSA, and so can NSA MiTM lot of people – Freedo May 04 '15 at 05:44
  • 1
    @Freedom No. The only time collision attacks matter is if the attacker has some control over the **legitimate** file (i.e. they can change it). That's the one that the developer intended people to download; a MitM doesn't give them this. If the hash is over HTTPS and the file over HTTP, that is one of the times a hash *can* help; it extends the integrity provided by HTTPS to the file and does so even if the hash is MD5. An attacker *cannot*, given a fixed file, create another file with the same MD5 hash or same SHA-1 hash, because MD5 and SHA-1 are not broken against those attacks. – cpast May 04 '15 at 05:49
  • 1
    @cpast but what is preventing the NSA or other MitM your legitimate download then change a few bytes and still maintain the same hash like they did on chosen-prefix attacks(see answer of Austin) they state "Note that it is not necessary for an attacker to build both executables from source code. It is perfectly well possible to take as the first file any executable from any source, and as the second file produce a second executable as malware...that the resulting files have the same MD5 hash value" could you clarify why this is possible and not broken MD5? I don't get at all. – Freedo May 04 '15 at 06:24
  • 1
    @Freedom They cannot do this unless they control *both* files. The reason chosen-prefix is a problem for executables is that given two executables, you can produce two *different* executables with the same functionality by adding stuff to the end (tacking stuff on to an executable doesn't change functionality), and the *new* executables have a collision. But the hash on the HTTPS page wasn't computed from the modified executable; it was computed from the real one. An attacker cannot get a different executable to match *that* hash. – cpast May 04 '15 at 06:38
  • 1
    The chosen-prefix collision is a problem if they get their modified-but-benign executable signed by a trusted source with MD5 as the digest method; the signature is valid for their modified-but-malicious executable as well, so MD5 code signing is bad. That's not the situation with a hash used as checksum; the real hash was taken over the actual real file, not their modified version, and they can't attack *that* hash without a (nonexistant) second-preimage attack. (the reason hashes are bad for verification is that they might be able to tamper with it, but that applies to all hash functions). – cpast May 04 '15 at 06:39
9

MD5 and SHA-1 are fast and may be supported in hardware, in contrast to newer, more secure hashes (though Bitcoin probably changed this with its use of SHA-2 by giving rise to mining chips that compute partial SHA-2 collisions).

MD5 collisions are feasible and preimage attack advances have been made; there is a publicly known SHA-1 collision for the official full-round algorithm, after other attacks significantly reducing its effective complexity, which may not yet be practical enough for the casual attacker but in the realm of possibility, which is why it can be called broken.

Nonetheless, "weak" or broken hashes can still be good for uses that do not need cryptographically secure algorithms, but many purposes that were not originally considered to be critical later can turn out to expose a potential attack surface.

Good examples would be finding duplicate files or use in version control systems like git - in most cases, you want good performance with high reliability, but do not need tight security - giving someone write access to an official git repository already requires you to trust other people to not mess around, and duplication checks should additionally compare the contents after finding that two files have the same size and hash.

Not sufficiently backing up insecure hashes with facts (e.g. byte-by-byte comparison) can be a risk, e.g. when someone like Dropbox had deduplication with MD5 without proper verification and an attacker sneaks in data with colliding hashes to cause data loss.

git addresses this issue by "trusting the elder", as Linus Himself said:

if you already have a file A in git with hash X is there any condition where a remote file with hash X (but different contents) would overwrite the local version?

Nope. If it has the same SHA1, it means that when we receive the object from the other end, we will not overwrite the object we already have.

So what happens is that if we ever see a collision, the "earlier" object in any particular repository will always end up overriding. But note that "earlier" is obviously per-repository, in the sense that the git object network generates a DAG that is not fully ordered, so while different repositories will agree about what is "earlier" in the case of direct ancestry, if the object came through separate and not directly related branches, two different repos may obviously have gotten the two objects in different order.

However, the "earlier will override" is very much what you want from a security standpoint: remember that the git model is that you should primarily trust only your own repository. So if you do a "git pull", the new incoming objects are by definition less trustworthy than the objects you already have, and as such it would be wrong to allow a new object to replace an old one.

[Original source: https://marc.info/?l=git&m=115678778717621&w=2]

And as they say, a disk failure is waaaaayy more likely than encountering an accidental hash collision (several orders of magnitude - SHA-1 collision < 10-40; disk non-recoverable bit error ~ 10-15).

Arc
  • 652
  • 5
  • 11
  • +1 for useful logic to developers to implement i don't think many developers will ever think about do something like that on their software even if they do upload things...most of them are focused just on not using weaker hashes to store passwords – Freedo May 04 '15 at 05:37
0

While collisions may exist, checking the integrity of a file when both a MD5 and SHA1 exist is unlikely to allow a collision. so if both of these simple checks validate the same file it is good enough. I hardly see people verify even one of them in most cases anyway.

Hugo
  • 1