What's wrong with SHA-1 having collisions?

Question

Say you go onto a website and are downloading a program. Next to the file there is a SHA-1 checksum of the file. You download the program, verify the checksum and find that it is the same as the one on the website - perfect! However you soon find that the SHA-1 checksum is not the same because the program is the same, but a man in the middle appears to have delivered you a collision of the program, and it's not what you thought it was at all. My question is, What's the problem with that? It would be extremely improbable for an attacker to be able to generate a collision that would be a runnable program, and could somehow infect your system. The biggest inconvenience I can see is that you would have to download it again.

So what harm could actually be caused by a collision?

Hashes are used for more than fingerprint verification. Some applications are more fragile than others. — CodesInChaos, May 17 '17 at 07:28
The SHA-1 collision is not a preimage attack. The researchers did *not* demonstrate that it's possible to craft a potentially malicious file for a predefined hash. — Arminius, May 17 '17 at 08:17
@Arminius you are absolutely correct and this is the main point people need to understand. In a nutshell, crafting two documents with same SHA1 signature requires control (ability to change) on both documents. — Marko Vodopija, May 17 '17 at 08:58

score 1 · Answer 1 · answered May 17 '17 at 07:35

Its been a while that SHA-1 collision was successfully achieved by Google researchers As proof of concept, the research presents two PDF files [PDF1, PDF2] that have the same SHA1 hash, but display totally different content.

You can learn more in this academic paper.

Its also worth to mention that this cryptographic hash function is 22-year old but as far as I'm concerned we still far from seeing real world attack conducted especially in the scenario you described except if you are targeted by a government or a a wealthy criminal enterprise:

A practical collision attack against SHA-1 would cost $700,000 in 2015 and $143,000 in 2018. He surmised at that cost attacks, especially if they were carried out by a wealthy criminal enterprise or government entity, could be feasible. (Bruce Schneier)

Now to be more practical, a SHA-1 collision may affect the Microsoft Kernel-Mode Code Signing Policy for instance. This attack relied on signature verification for loading only signed kernel-mode drivers. You can find more here but for now I'm not sure you need to worry about a MITM attack delivering an altered file..

Well, after all anyone can now create visually distinct PDF documents with the same SHA-1 hash. So there are practical ways to take advantage of the collision found by the researchers. — Arminius, May 17 '17 at 07:39
Here are some numbers that give a sense of how large scale the computation was in order to produce the two PDFs: Nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total 6,500 years of CPU computation to complete the attack first phase 110 years of GPU computation to complete the second phase The team leveraged Google’s technical expertise and cloud infrastructure to compute the collision which is one of the largest computations ever completed. So my answer is: No, not anyone can achieve this. — Soufiane Tahiri, May 17 '17 at 07:41
You're missing the point. You don't need to find your own collision to achieve that. The researchers did that for you. — Arminius, May 17 '17 at 07:44

Sjoerd · Answer 2 · 2017-05-17T09:14:29.043

1

Even though it proved to be possible to create two files with the same hash (collision), it is not feasible to create a file that matches some predetermined hash (preimage attack). Therefore, the scenario you describe is not particularly vulnerable to a hash collision.

However, keep in mind that if the attacker can inject another executable, he may also be able to inject another hash value.

SHA1 collisions are particularly a problem when signing documents, such as certificate requests. With the ability to create a hash collision an attacker can create two certificate requests, one for legit.com and one for evilattacker.com, with the same hash. The certificate authority will then sign legit.com and the attacker can use the signature to create a valid certificate for evilattacker.com, and vice versa.

edited May 17 '17 at 09:14

answered May 17 '17 at 08:30

Sjoerd

28,707
12
74
102

Consider adding info about preimage attack into your answer. People believe, now that SHA1 collision has been found that it is possible to produce a document that has the same SHA1 hash as any other arbitrary document. This is simply not true. I believe this generates lot's of confusion. – Marko Vodopija May 17 '17 at 09:05
However the collision problem also means that progress on the pre-image analysis become more likely – eckes May 17 '17 at 22:41

CBHacking · Answer 3 · 2017-05-17T22:20:54.550

You ask about a scenario where an attacker is able to create a hash collision with an arbitrary file the attacker does not control (in this case, the valid download). This is called a Preimage attack, and is generally harder than a simple collision attack, but your scenario involves a preimage attack rather than a simple collision. Under that scenario, the following assertion does not hold:

It would be extremely improbable for an attacker to be able to generate a collision that would be a runnable program, and could somehow infect your system.

Assuming you have determined how to produce any file Y that has a hash collision with valid file X, it is likely just as easy to instead produce a working binary Z with the same hash value. Binaries have enormous degrees of freedom: you can modify embedded resources (string tables and icons and so on), metadata (program name, author, compilation date, etc.), and of course just stick the "garbage" used to produce the hash collision at the end of the file, after the real (malicious) program.

When people produce a hash collision, they aren't working from a pristine state in which there is no hash, carefully crafting data that bit by bit produces the desired hash (at least, not if they're using any even vaguely-modern hash algorithm). All possible inputs, including the empty string, have a valid hash digest. The goal of the collision-seeker is to find an input that produces the desired output, but there's no reason the input can't be partially a blob of fixed data (such as a malware program). Yes, that blob will have its own initial hash value (digest)... but so does the empty string!

So yes, if arbitrary blob A by itself produces the desired collision, blob A by itself (as a file) is extremely unlikely to be a meaningful, much less malicious, program. However, the amount of work it takes to find blob A is the same amount as it takes to find blob B that, when concatenated on the end of fixed malicious program M, the digest of M+B produces the collision.

Now, with that said, the collisions found thus far on SHA1 are not preimages. That is, the security researchers found two arbitrary blobs A and B that have the same SHA1 digest, but did not demonstrate the ability to, for a specified file X (or specified digest D, where presumably D = SHA1(X) for some X), produce a file Y that has the same SHA1 digest D as X. They produced a collision, but not a preimage attack.

Collision resistance is a characteristic of a secure hash function, so finding any collision has cast doubt on the overall security of SHA1. However, we're some ways yet from being able to produce preimage attacks (against either arbitrary digests or arbitrary files) against SHA1.

I think you are incorrect about the amount of work it takes to create an executable that matches a specific hash. It is possible to create a collision with SHA1, but I think a preimage attack is still infeasible. — Sjoerd, May 17 '17 at 08:33
Just because someone has managed to produce two meaningful files with the same hash, it doesm't mean that it is now trivial to produce a new meaningful file with the same hash as some given file. — Simon B, May 17 '17 at 15:02
@Sjoerd Yes, but the question is specifically about preimage attacks, even though the asker doesn't seem to realize that. The question isn't about a collision between two arbitrary blobs, but about the situation *where an attacker has replaced a valid binary with a malicious one **that has the same digest***. That's a preimage attack, and that's what my answer was about. I'll edit the answer to make this clearer, though. — CBHacking, May 17 '17 at 21:59
@SimonB The question specifically postulates a scenario where the attacker produces a file with the same digest as "some given file", and then for some reason assumes that the attacker-produced file won't be meaningful. This assumption is invalid under the postulated scenario. — CBHacking, May 17 '17 at 22:22

What's wrong with SHA-1 having collisions?

3 Answers3