What is the value of MD5 checksums if the MD5 hash itself could potentially also have been manipulated?

Downloads on websites sometimes have an MD5 checksum, allowing people to confirm the integrity of the file. I have heard this is to allow not only corrupted files to be instantly identified before they cause a problem but also for for any malicious changes to be easily detected.

I follow the logic as far as file corruption is concerned but if someone deliberately wants to upload a malicious file, then they could generate a corresponding MD5 checksum and post that on the download site along with the altered file. This would deceive anyone downloading the file into thinking it was unaltered.

How can MD5 checksums provide any protection against deliberately altered files if there is no way of knowing if the checksum itself has been compromised?

Austin ''Danger'' Powers

Posted 2014-12-08T07:24:24.453

Reputation: 5 992

Why would a file need to be the same size after it's been altered? I'm saying a file could be changed, a new hash generated for the malicious version... then the hash posted on the website could be replaced with the new one by the malicious entity. – Austin ''Danger'' Powers – 2014-12-08T07:40:12.243

1Most download sites give the file size and often the creation date. I suppose those could also be altered on the web site. However, wouldn't the web site owner detect all of the hacking to the site? – fixer1234 – 2014-12-08T07:44:22.303

3If we are relying on the website host noticing subtle timestamp discrepancies instead of the MD5 hash acting as a seal of authenticity... then the protection provided by the checksum has pretty much evaporated. – Austin ''Danger'' Powers – 2014-12-08T07:45:43.367

1I'm referring to things like logs of site access rather than noticing subtle content change, although the web page could have its own hash known to the site owner. – fixer1234 – 2014-12-08T07:48:34.117

1The point is that people accessing the site have no way of knowing how proactive the website host is in checking those logs. The MD5 checksum is supposed to provide a way for people to check the integrity of their own downloads, without relying on the actions of any other parties. – Austin ''Danger'' Powers – 2014-12-08T07:49:53.363

1MD5 doesn't generate against file contents so there will never be a way of checking file integrity - apart from corruption, but this often results in a different file size so the hash will be different. If a malicious file has a valid hash then there'll be no way of telling at this stage. – Kinnectus – 2014-12-08T07:51:51.850

SHA supposedly replaces MD5 because it is harder to produce the same hash from a modified file. However, it makes no difference for the scenario you raise. – fixer1234 – 2014-12-08T07:57:27.527

4@BigChris I'm not sure what you mean, but it sounds wrong. Cryptographic hash algorithms like MD5 are completely about the message data. Two random messages of the same length will almost certainly have different hashes. – Matt Nordhoff – 2014-12-08T09:58:15.343

3@MattNordhoff exactly. If an MD5 checksum isn't generated based on file data, then what is it based on? – Austin ''Danger'' Powers – 2014-12-08T10:05:01.290

MD5 hash data would be constructed on the data, yes, but it wouldn't take too much effort to create a malicious file with the same hash. As said, there would be no way of checking if the file was malicious or not. Read: http://www.mscs.dal.ca/~selinger/md5collision/

– Kinnectus – 2014-12-08T14:52:25.493

@MattNordhoff Where "almost certainly" = 2^(n/2) where n is the number of bits in the output hash value. Birthday attacks.

– a CVn – 2014-12-09T08:18:29.873

2Sometimes hashes are published on first-party server whereas actual downloads are hosted on third-party mirrors and/or CDNs. – el.pescado – 2014-12-09T10:23:55.247

It's said that encryption is all about leverage -- instead of hiding the entire file, you can just hide a tiny key. Cryptographic hashing is the same way -- instead of verifying the entire file, you can just verify a tiny key. – that other guy – 2014-12-10T03:03:35.280

Answers

I have heard this is to allow [...] for any malicious changes to be detected also.

Well you heard wrong, then. MD5 (or SHA or whatever) checksums are provided (next to downloads links, specifically) only for verifying a correct download. The only thing they aim to guarantee is that you have the same file as the server. Nothing more, nothing less. If the server is compromised, you’re SOL. It’s really as simple as that.

Daniel B

Posted 2014-12-08T07:24:24.453

Reputation: 40 502

31+1. They are primarily used to protect against accidental corruption (network transfer errors, bad sectors on disk, and so on). To protect against malicious corruption the checksum needs to come form a trusted unconnected location. The same with PGP/GPG/similar signed messages: they only completely assure the content if you trust where you obtained the public key from. – David Spillett – 2014-12-08T10:50:19.047

MD5 is in fact blatantly broken against malicious changes assuming the original file was prepared for it. – ratchet freak – 2014-12-08T11:57:50.643

1You might also want to add to your answer that difital signatures address this limitation (assuming that you trust the certificate/certifying authority) – atk – 2014-12-08T14:29:13.720

2It's even worse than this -- if someone can tamper with your traffic to/from the server, then even if the server isn't compromised they can modify both file and checksum that you receive. – cpast – 2014-12-08T18:36:52.910

1@cpast Yeah, so what? Like I said, MD5 (by itself) isn’t about security at all. – Daniel B – 2014-12-08T21:59:35.497

1@DanielB So it doesn't guarantee that you have the same file as the server. – cpast – 2014-12-08T22:22:39.097

3To expand: If it did guarantee that you had the same file as the server had, it would be a legitimate security measure, because it would mean you don't have to trust the network. That's exactly what the MACs in TLS do -- prove that what you got is what the server sent, but TLS can't do anything about a compromised server either. If a good hash is transmitted over a trusted connection, it can provide security (which is derived from the trusted connection); if it's sent over the same connection as the file, then it's useless because it's no more tamper-resistant than the file itself was. – cpast – 2014-12-08T22:45:33.943

2This is wrong. Sometimes the checksums are provided securely, but the download is not. Since MD5 is broken, the security MD5 checksums provide are weaker than more secure checksums, but before MD5 was broken, a securely provided MD5 (e.g. one that was signed or sent of HTTPs) that matched the MD5 of the download was strong evidence that the download received was the one the server was making available. I'll add an answer with more detail below now. – Matthew Elvey – 2014-12-09T16:08:25.860

@MatthewElvey MD5 guarantees you get the correct file, it does not guarantee that you get a safe file. If the file itself is malicious from the very beginning, you're screwed, because MD5 does not tell you anything about the file itself – Raestloz – 2014-12-10T03:05:57.383

@Raestloz Your observation goes beyond the actual question. If ever a well known and trusted organization starts distributing malicious software (and we all know it never happened, don't we? :) then it would (should) soon become untrusted! – matpop – 2014-12-10T17:32:54.743

@matpop actually, I was addressing the question. Read carefully: the question posits a "what if" situation where a maliciously modified program is posted along with its (already malicious) MD5. MD5 is a security measure of "this is correct", not "this is safe", therefore in this case MD5 is worthless ("what is the value?" the title says), unless during the transmission somebody altered it to include even more malicious software, but you're already screwed anyway. Thus I responded to Matthew, who says that this answer is wrong (it's not) – Raestloz – 2014-12-11T01:19:56.977

@Raestloz In the first place you wrote: "If the file itself is malicious from the very beginning"... Sorry, but to me those words do not mean the same thing as a deliberately ALTERED file. The OP knows that hashes are not part of any antivirus system and can't be used for malware detection. That said, MatthewElvey is right, this answer is somewhat radical and incomplete, as it misses an important exception: if you can verify a digital signature of the hash sum, then the hash can also be used to demonstrate that the downloaded file is UNALTERED (afaik though, HTTPS itself is insufficient). – matpop – 2014-12-11T09:07:11.807

@matpop sorry, but my words cannot be any more exact. If a file has been deliberately altered (say, an innocent exe that has been altered to execute malicious code) AND uploaded as a new entry, the malicious file is a separate entity from the original, innocent file and is therefore malicious from the very beginning. This is different from an innocent file that gets intercepted and altered in transit between server and downloading client. – Raestloz – 2014-12-11T09:34:21.083

@matpop OP is asking "what good is MD5 if it isn't generated from the original, unaltered file to begin with?", the answer is "worthless". This answer highlights that MD5 can only tell you that you and the server have the exact same file, and nothing more. It doesn't inherently carry any security benefit. The security "benefit" of knowing the file has been tampered with is a byproduct. In the meantime, perhaps we should move this to chat? – Raestloz – 2014-12-11T09:38:34.837

@Raestloz Thanks for your reply :) We clearly disagree on terms to be used but almost think the same. Though the "security benefit" of hashing can be considered a "byproduct", IMO it's important to mention that hashing is actually a fundamental part of the process of authentication (you may want to have a little look at my brief answer to get what I mean). Still our comments add something so let's not move to chat for now. – matpop – 2014-12-11T10:00:08.623

@matpop that is what I said, MD5 can only tell you that the file is correct, it cannot say that the file is safe. I think I used the wrong word in my last comment – Raestloz – 2014-12-11T10:11:55.200

@Raestloz You may admit however that this answer doesn't provide any "explanation and context" and there are better answers here that got far fewer upvotes. – matpop – 2014-12-11T10:26:01.640

@matpop well, our discussion is getting a bit too long :D so I'll end it here. My personal belief is that this answer is the best because instead of trying to improve the checksum's security (like, say, having the checksum delivered securely through other channels, provide mirrors, etc etc) that other answers do, this one highlights only the core issue: that checksum can't help determine a data's safety. To me, trying to secure the checksum is pointless, it provides false sense of security. The only way to know is to check the file itself – Raestloz – 2014-12-12T02:37:40.967

@Raestloz Not to have the last word, seriously. Don't hate me, allow me one last comment, too. 1. It seems you keep putting authentication and "malware detection" on the same level; the first one is really possible with (signed) hash sums! 2. One does not simply (!) say that the only aim of hashes is to guarantee that you have the same file as the server; the fact that hashes are most of the times provided without signature does not make such statement valid in general. It's not "really as simple as that". Cheers. NRN – matpop – 2014-12-12T14:35:11.967

Well, I guess you’ll all be delighted to know my answer was specifically about checksums next to download links (or in a .*sum file), which I believe is what the question is about. It’s certainly not about Authenticode and the like. ;) – Daniel B – 2014-12-12T15:28:36.780

@DanielB , If I know md5checksum of say , a firefox download , then can I download that file from untrusted sources (instead of official website) and be assured that if checksum of this file matches the checksum of same file on firefox official website , then I don't need to worry. I am assuming firefox website is not hacked. – Number945 – 2019-09-06T05:44:25.087

@BreakingBenjamin Generally, yes. MD5 and SHA1 however are not suitable for this anymore. – Daniel B – 2019-09-06T07:28:22.390

The solution used by some package management systems such as dpkg is to sign the hash: use the hash as input to one of the public key signing algorithms. See http://www.pgpi.org/doc/pgpintro/#p12

If you have the public key of the signatory, you can verify the signature, which proves the hash is unmodified. This just leaves you with the problem of getting the right public key in advance, although if someone once tampers with the key distribution they also have to tamper with everything you might verify with it otherwise you'll spot that something strange is going on.

pjc50

Posted 2014-12-08T07:24:24.453

Reputation: 5 786

Your assumption is correct. There is an exception though. If the server providing the file and the page where the hash is are not managed by the same entity. In that case the software developer may want to say "hey people download this from that place but only believe if hash = xxxx". (This might be usefull for CDN's as an example). I guess this was the reason why someone did it in the first place. Than others just followed thinking how cool it would be to show the hash. Not even thinking how useful it is not even both the file and the hash are on the same location.

Having this said, this is worth what it is. Don't assume too much about security as others already stated. If and only if you can absolutely trust the original hash, than the file is good. Otherwise an attacker with enough motivation and knowledge can tamper both file and the hash, even if these are in different servers and managed by different entities.

nsn

Posted 2014-12-08T07:24:24.453

Reputation: 268

Sometimes the checksums are provided securely, but the download is not. Since MD5 is broken, the security MD5 checksums provide are weaker than more secure checksums, but before MD5 was broken, a securely provided MD5 (e.g. one that was signed with PGP or GPG or Gatekeeper, or fetched over HTTPS) that matched the MD5 of the download was strong evidence that the download received was the one the server was making available.

I have been writing about the lamentable lack of secure checksums for years, here.

Users shouldn't download untrusted executables over untrusted networks and run them, because of the risk of MITM attacks. See, e.g. "Insecurities within automatic update systems" by P. Ruissen, R. Vloothuis.

2014 Addendum: No, it's NOT wrong "that checksums posted on web pages are used to detect malicious modifications," because this IS a role they can perform. They do help protect against accidental corruption, and if served over HTTPS or with a verified signature (or better yet, both) help protect against malicious corruption! I have obtained checksums over HTTPS and verified that they matched HTTP downloads many times.

Nowadays, binaries are often distributed with signed, automatically verified hashes, yet even this is not perfectly secure.

Excerpt from above link: "The KeRanger application was signed with a valid Mac app development certificate; therefore, it was able to bypass Apple’s Gatekeeper protection." ... "Apple has since revoked the abused certificate and updated XProtect antivirus signature, and Transmission Project has removed the malicious installers from its website. Palo Alto Networks has also updated URL filtering and Threat Prevention to stop KeRanger from impacting systems. Technical Analysis

The two KeRanger infected Transmission installers were signed with a legitimate certificate issued by Apple. The developer listed this certificate is a Turkish company with the ID Z7276PX673, which was different from the developer ID used to sign previous versions of the Transmission installer. In the code signing information, we found that these installers were generated and signed on the morning of March 4."

2016 Addenda:

@Cornstalks: Re. your comment below: Wrong. As currently noted at the collision attack Wikipedia article you link to, "In 2007, a chosen-prefix collision attack was found against MD5" and "the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value." Thus, even if the MD5 is provided securely and an attacker can't modify it, an attacker still CAN use a chosen-prefix collision attack with a chosen-prefix containing malware, which means MD5 is NOT secure for crypto purposes. This is largely why US-CERT said MD5 "should be considered cryptographically broken and unsuitable for further use."

A couple more things: CRC32 is a checksum. MD5, SHA, etc. are more than checksums; they're intended to be secure hashes. That means they're supposed to be very resistant to collision attacks. Unlike a checksum, a securely communicated secure hash protects against a man-in-the-middle (MITM) attack where the MITM is between the server and the user. It doesn't protect against an attack where the server itself is compromised. To protect against that, people typically rely on something like PGP, GPG, Gatekeeper, etc.

Matthew Elvey

Posted 2014-12-08T07:24:24.453

Reputation: 419

I like this answer because it highlights a fundamental part of a checksum - it's simply one metric, of many, to check the validity of a file's contents. If the network itself is untrusted, it's not that unfeasible to imagine one replacing the MD5-hashes and patching binaries on the fly (as we've already seen on some Tor exit nodes)... Of course then, MD5 provides no protection against deliberately modified files because you're already placing your trust in the provider of said files to begin with. – Breakthrough – 2014-12-09T16:36:53.563

MD5 isn't totally broken: the attack on it is a collision attack, not a preimage attach (which would be much, much worse). If the MD5 is provided securely and an attacker can't modify it, then an attacker can't use a collision attack (and must use a preimage attack), which means MD5 is still pretty secure for that purpose. MD5 is worth being phased out because of its collision vulnerability, but it doesn't have a (known) preimage vulnerability so it's not totally broken. Just half broken.

– Cornstalks – 2014-12-09T16:53:39.957

+1! But... Is a signed hash really just as secure (trustable) as an unsigned hash fetched over https (ssl/tls)? I think it's still preferable that the hash itself is signed anyway... – matpop – 2014-12-10T17:46:01.477

This is really a problem. Showing checksums on the same site as the file to download is insecure. A person who can change the file can also change the checksum. The checksum should be shown through a complete separated system but this is hardly feasible, because how to tell the user in a safe way where the checksum can be found.

A possible solution is the use of signed files.

(BTW: MD5 is unsafe anywhere and shouldn't be used anymore.)

marsh-wiggle

Posted 2014-12-08T07:24:24.453

Reputation: 2 357

This is the precise reason posted checksums often carry a disclaimer saying "This cannot protect against malicious modification of the file". So, the short answer is "they can't provide any protection whatsoever against a deliberately altered file" (although, if the page is delivered over HTTPS, HTTPS itself protects against modification; if the file isn't delivered over HTTPS but the checksum is, then that might help some, but isn't a common case). Whoever told you that checksums posted on web pages are used to detect malicious modifications was wrong, because this is not a role they can perform; all they do is help protect against accidental corruption, and lazy malicious corruption (if someone doesn't bother to intercept the page giving you the checksum).

If you want to protect against deliberate modification, you need to either keep people from messing with the checksum, or make it impossible for anyone else to generate a valid checksum. The former can involve giving it out in person or similar (so the checksum itself is trusted); the latter goes to digital signature algorithms (where you need to securely get your public key to the downloader; in TLS, this is done by ultimately trusting certificate authorities directly and having them verify everyone else; it can also be done over a web of trust, but the point is that something has to be securely transferred at some point, and just posting something on your site isn't enough).

cpast

Posted 2014-12-08T07:24:24.453

Reputation: 2 279

2Hashes can protect against malicious alteration if one knows via some independent source what the expected hash of a trustworthy version of a file should be. The value of having the web site list the hash values of its files doesn't lie in letting people who download files from a site check the hash of the downloaded file against the same site, but rather in letting people who know from some other source the hash of the file they want, know whether the file in question will match it before they download it. BTW, one thing I'd like to see... – supercat – 2014-12-08T16:56:33.667

...would be a form of URL/URI that included an expected hash value (probably SHA rather than MD5), and would specify that a browser should only accept a file if the hash matches what's specified. In cases where the same large file will need to be accessed by many people, giving all of those people a URL via https:// but having them download the file from a proxy could be more efficient than having them all use https:// directly from the source. – supercat – 2014-12-08T16:59:52.737

@supercat That's what I meant by "keep people from messing with the checksum" -- something must be securely transferred, and if that's the checksum then the checksum can help protect against maliciously tampering with the file. – cpast – 2014-12-08T18:34:36.157

An MD5 checksum transmitted via some path other than a file itself would provide protection against tampering unless the file was deliberately created to facilitate such tampering. By contrast, something like CRC32 would provide almost no protection against tampering even if the original source of the file was trustworthy and the CRC32 was delivered securely. – supercat – 2014-12-08T18:38:51.983

How can MD5 checksums provide any protection against deliberately altered files if there is no way of knowing if the checksum itself has been compromised?

You are entirely correct. The goal, then, would be to make your "if" wrong — if we know that a secure cryptographic hash of a file isn't compromised, then we know that the file isn't compromised either.

For example, if you post a hash of a file on your website, and then link to a copy of the file on a third-party mirror server — common practice in old-fashioned free software distribution — your users can be protected against some types of attacks. If the mirror server is malicious or compromised, but your website is okay, the mirror won't be able to subvert your file.

If your website uses HTTPS, or you sign the hash with gpg, your file can also be (mostly) protected from network attackers like malicious Wi-Fi hotspots, rogue Tor exit nodes, or NSA.

Matt Nordhoff

Posted 2014-12-08T07:24:24.453

Reputation: 153

1Regarding gpg: remember that this has similar problems if you don't entirely trust the public key hasn't been replaced by a compromised one and the content signed with the private key corresponding to that. – David Spillett – 2014-12-08T10:54:46.750

How can MD5 checksums provide any protection against deliberately altered files if there is no way of knowing if the checksum has not been compromised either?

This is a really good question. In general, your assessment of MD5 manipulation is spot on. But I believe the value of MD5 checksums on downloads is superficial at best. Perhaps after you download a file you can check the MD5 you have against a website, but I tend to see side-by-side MD5 storage as a “receipt” or something that is nice to have but not reliable. As such, I have generally never cared about MD5 checksums from downloaded files, but I can speak from my experience creating ad-hoc server-based MD5 processes.

Basically what I have done when a client wants to sweep a file system for MD5 checksums is to have them be generated into CSV files that maps filename, path, MD5—and other sundry file info—into a structured data format that I then have ingested into a database for comparison and storage.

So using your example, while an MD5 checksum might sit next to a file in it’s own text file, the authority record MD5 checksum would be stored in a non-connected database system. So if someone somehow hacked into a fileshare to manipulate data, that intruder would not have any access to the MD5 authority records or the connected history.

Recently I discovered a nice piece of library archival software called ACE Audit manager which basically is a Java application designed to sit and watch a filesystem for changes. It logs changes via MD5 changes. And it operates on a similar philosophy as my ad-hoc process—store the checksums in a database—but it takes it a step further but creating an MD5 checksum of MD5 checksums which is known as a hash tree or Merkle tree.

So let’s say you have 5 files in a collection. Those 5 files in ACE Audit manager would then get another—let’s call it “parent”—checksum that is a hash generated from the 5 MD5 checksums of each file. So if someone were to tamper with just one file, the hash for the file would change and so would the hash for the whole “parent” collection.

In general the way you need to look at MD5 checksums and related integrity hashes is unless they are not connected to some non-direct storage for the MD5 hashes themselves, they can be corrupted. And their value as a long term data integrity tool is equivalent to a cheap lock that comes “free” on a new piece of luggage; if you are serious about locking your luggage you will get a lock that cannot be opened in 5 second with a paperclip.

JakeGould

Posted 2014-12-08T07:24:24.453

Reputation: 38 217

"MD5 checksum of MD5 checksums" is known as a Merkle Tree. – pjc50 – 2014-12-08T10:19:14.223

@pjc50 Thanks! Edited the answer to reference that. – JakeGould – 2014-12-08T19:05:31.580

You can't modify the MD5 checksum without also modifying the file. If you download the file, then download the hash, and then your computation of the file's has doesn't match what is given, either the hash or the file is wrong or incomplete.

If you want to "tie" the file to something external, such as author, machine, etc. it needs to be signed, using a PKI-type process with certificates. The file author, etc. can sign the file with his/her private key, and you can verify signatures with the public key, which should be publicly available, itself signed by a CA both you and the author trust, and downloadable, preferably from multiple locations.

Modifying the file would make the signature invalid, so this can be used to verify file integrity too.

LawrenceC

Posted 2014-12-08T07:24:24.453

Reputation: 63 487

Hashes indicate whether your version of the file (the "download") differs from the server's version. They offer no guarantee to the authenticity of the file.

Digital signatures (asymmetric encryption + hash function) can be used to verify that the file has not been modified by anyone who does not have the corresponding private key.

The file's creator hashes the file and encrypts the hash using their (secret) private key. That way, anyone with the corresponding (non-secret) public key can verify that the hash matches the file, but while the file contents can be modified, nobody can replace the corresponding hash with one that matches the file (after hash is decrypted using the public key) - unless they manage to brute-force the private key, or gain access to it somehow.

What's to stop Mr "A.Hacker" from simply modifying the file, then signing it with their own private key?

In order to validate the file, you need to compare its hash to the one that you obtained by decrypting the associated digital signature. If you think the file is from "I.M.Awesome", then you decrypt the hash using his key and the hash does not match the file, since the hash was encrypted using A.Hacker's key.

Digital signatures hence allow one to detect both accidental and malicious changes.

But how do we get I.M.Awesome's public key in the first place? How can we ensure that when we obtained his/her key, it wasn't actually A.Hacker's key being served by a compromised server or a man-in-the-middle attack? This is where certificate chains and trusted root certificates come in, neither of which are perfectly secure [1] solutions to the problem, and both of which should probably be well explained on Wikipedia and on other questions on SO.

[1] Upon inspection, the root certificates that shipped with the Microsoft OS on my work PC include a certificate from the U.S. Government. Anyone with access to the corresponding private key cough NSA cough can therefore serve content to my web browser that passes the "is there a padlock in the address bar" check. How many people will actually bother to click on the padlock and see who's key-pair is being used to "secure" the connection?

Mark K Cowan

Posted 2014-12-08T07:24:24.453

Reputation: 458

It only takes one person who checks the certificate chain to cause a scandal. If you think the NSA would use a government CA key for MITM rather than using one stolen from a privately held or -- even better -- foreign certificate authority (thus providing plausible deniability), I have a bridge to sell you. – Charles Duffy – 2014-12-08T21:04:38.997

I was suggesting the possibility that it could be use to target a MITM at a particular user. As to whether it's actually likely, that's for the tin-foil people to debate on – Mark K Cowan – 2014-12-08T21:47:17.407

I'm not questioning whether a targeted MITM is likely. I'm questioning whether being careless enough to use a readily traced and attributed CA key to perform it is likely. Particularly for a sufficiently high-value target, outbound 'net traffic is liable to be recorded in enough detail to include metadata up to and including the public part of the SSL handshake, so even if the user doesn't look, their security staff or automated infrastructure might do so in retrospective analysis. – Charles Duffy – 2014-12-08T21:59:32.277

I have heard this [MD5 checksum] is to allow [...] also for any malicious changes to be easily detected.

Well, you haven't heard totally wrong. Digital signature is actually what's used to detect malicious changes. For some important reasons, hashing is a fundamental part of digital signature, in that only the hash is actually signed, not the entire original file.

That being said, if the source doesn't provide the signature of the hash and a trusted way to verify it, then you're right, no protection is given against deliberately altered files, but the hash is still useful as a checksum against accidental corruption.

Here's a real world example that may clarify the whole thing. The following passage is particularly significant on the subject:

For older archived CD releases, only MD5 checksums were generated [...] For newer releases, newer and cryptographically stronger checksum algorithms (SHA1, SHA256 and SHA512) are used

matpop

Posted 2014-12-08T07:24:24.453

Reputation: 101

speed

Often people find it is much faster to (a) download a large file from some untrusted "nearby" content delivery network (CDN), mirror site, torrent peers, etc. and also download the corresponding short checksum file (often SHA256; older software often used MD5) from a few trusted sources. They find it unbearably slow to (b) download the entire large file directly from a trusted source.

validation

Often that person finds that everything validates -- the sources (trusted and untrusted) agree on the same checksum, and running shasum (or md5sum) with any of those short checksum files (it doesn't matter which one, when they are all identical) indicates that the large file has a matching checksum.

modification

You are right that when Mallory maliciously alters a large file sitting on some download site, it would be easy for Mallory to also maliciously the checksum for that file on the same download site so that shasum (or md5sum) run on that malicious checksum file would seem to validate the large file. But that checksum file is not the (only) one the downloader should use for validation.

When the downloader compares that malicious checksum file to the checksum files downloaded from trusted sources, if the original checksum slips through even one time, then the downloader will see that everything does not validate, and will know that something has gone wrong.

As cpast has said before, if a good cryptographic checksum is transmitted over a trusted connection, it can provide security (which is derived from the trusted connection).

As supercat has said before, the checksum files one site don't help people who download large files from the same site and in the same way that they download the checksum files -- they help people who want to download files from some other site.

"In terms of security, cryptographic hashes such as MD5 allow for authentication of data obtained from insecure mirrors. The MD5 hash must be signed or come from a secure source (an HTTPS page) of an organization you trust." -- https://help.ubuntu.com/community/HowToMD5SUM

Cryptographic checksums are an important part of practical public-key signatures (as implemented in GnuPG and other OpenPGP-compliant software). Public-key signatures have some advantages over the checksums alone.

David Cary

Posted 2014-12-08T07:24:24.453

Reputation: 773