What are the implications of a SHA-1 collision being found?

Question

Google have announced the discovery of a SHA-1 collision between two PDF files with distinct content.

While SHA-1 hashes are no longer permitted for SSL/TLS certificate fingerprints, and other measures would prevent certificate fingerprints from being manipulated in this way, what other uses of SHA-1 would be affected?

The attack authors mention GIT hashes as one possibility, but are there other common uses which do not have mitigation other than upgrading to another hash family or later SHA method?

For answers relating to how this collision was found, see this other question on Crypto.se

Researchers used an MD5 collision to generate a [fake CA](https://www.win.tue.nl/hashclash/rogue-ca/). — paj28, Feb 23 '17 at 13:48
@paj28 I think that was the trigger for the "random data in serial number field" mitigation, which should help in the same way with making SHA-1 collisions of certificates harder. — Matthew, Feb 23 '17 at 14:12
Indeed it was. Do we know whether all CAs actually randomize the serial number? MD5 was long deprecated by the time of the attack I linked, so there's clear precedent of CAs not following best practice. — paj28, Feb 23 '17 at 15:31
@paj28 The CA/B forum requires it, but that doesn't mean that it always happens correctly, hence the desire to deprecate SHA-1. If we *were* confident that it was always implemented as required, there would be no need. — Xander, Feb 23 '17 at 15:33
For git, see also: http://security.stackexchange.com/q/67920/29865 — Ajedi32, Feb 23 '17 at 20:41
@Xander - If you have time, perhaps you could expand your comment into an answer to [this question](http://security.stackexchange.com/questions/152145/does-randomness-prevent-collision-attacks) — paj28, Feb 23 '17 at 23:24
@paj28 Done. I'm sure you already know most of, and let me know if you have any feedback. — Xander, Feb 24 '17 at 14:48

score 47 · Accepted Answer · answered Feb 23 '17 at 17:37

Currently, given the specific collision method used, the impact is quite limited. In particular, this method does not allow for an attacker to generate a collision with an existing file, where a SHA-1 hash has been provided. It wouldn't be possible, for example, to use this method to generate a malicious executable file which matched the signature provided on the legitimate distribution website.

It would be possible, in theory, for an attacker to generate two executable files which have the same SHA-1 hash, but perform different things when run. Similarly, it would be possible to generate multiple ISO images which have the same SHA-1 hash. However, in each case, other hash values would not match, and it's common for download sites to provide multiple types of hash (for example, Ubuntu provide MD5, SHA-1 and SHA256 hashes for all downloads). This can be seen with the shattered-1.pdf and shattered-2.pdf files:

# sha1sum shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf

# md5sum shattered-1.pdf
ee4aa52b139d925f8d8884402b0a750c  shattered-1.pdf

# sha1sum shattered-2.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf

# md5sum shattered-2.pdf
5bd9d8cabc46041579a311230539b8d1  shattered-2.pdf

It may be possible to create a kind of polyglot file which produces the same hash values in both SHA-1 and MD5, but this has not been demonstrated, and would still fail, given, for example, a SHA-512 hash.

Similarly, for any system where a SHA-1 hash is used as a file identifier, it may be possible to get one half of a colliding pair of files into the system, then to swap it out for the other. An example of this would be a backup system which used SHA-1 on a file level for determining whether files had been copied correctly. However, it would be difficult to make a hash of the entire backup contents remain the same in this case, since the malicious file is unlikely to form the prefix for the whole backup file (it's more likely to be something identifying the whole file as a backup).

Overall, therefore, the Google announcement mostly just confirms what had been suspected for a while - SHA-1 is vulnerable to collisions, just as MD5 was, but finding them requires a lot of effort, and most of the really high profile targets (such as generating CA certificates) have mitigation in place from the very similar MD5 collisions found previously. Experts have been advising moving from SHA-1 for a while now, and this advice still stands.

Just as with MD5, however, this doesn't particularly impact the use of HMAC-SHA1, since the specific combination method used in the construction of HMAC values makes this type of collision irrelevant.

Using both hashes is like using the concatenation, see [Is using the concatenation of multiple hash algorithms more secure?](http://security.stackexchange.com/questions/83881/is-using-the-concatenation-of-multiple-hash-algorithms-more-secure) for a result on this. — Paŭlo Ebermann, Feb 23 '17 at 19:40
No, that's about combining hashes. I'm pointing out that other hash methods can still distinguish between colliding Sha-1 files. The answer you've referenced doesn't touch on this at all. — Matthew, Feb 23 '17 at 20:01
If a combined hash made of MD5 and SHA-1 (by concatenation) has a collision on a message, then both SHA-1 and MD5 have a collision for this message – or did I understand this wrong? — Paŭlo Ebermann, Feb 23 '17 at 22:39
@PaŭloEbermann, I interpreted your first comment as being about running one hash algorithm on the output of another (which is a *bad idea*) but looking more closely that's not what you meant, and you're correct. — Wildcard, Feb 24 '17 at 04:50
@PaŭloEbermann why is it a bad idea ? For example it is common to hash Strings with SHA-256 and then with BCrypt (to use BCrypt's strenght but mitigate it's String size limitation with the help of SHA) — niilzon, Feb 24 '17 at 07:49
The TLS/SSL handshake combines both hashes like this in its PRF. Interesting to see a tried and tested protocol that works without "all its eggs in one basket". — SilverlightFox, Feb 24 '17 at 08:16
@PaŭloEbermann Ah, that makes more sense, although there currently isn't such a collision known. I don't know whether it would be possible to generate such a message - but it would be bad to assume it was impossible! — Matthew, Feb 24 '17 at 09:16
Another thing: apparently the collision breaks Subversion; e.g., http://blogs.collab.net/subversion/subversion-sha1-collision-problem-statement-prevention-remediation-options seems like a good explanation — derobert, Mar 01 '17 at 18:11
HMAC SHA 1 looks interesting. May be, I'll ask a question about it — hola, May 28 '21 at 15:50

What are the implications of a SHA-1 collision being found?

1 Answers1

Linked

Related