3

This is a broader question but here a concrete example:

From apache.org:

File hashes are used to check that a file has been downloaded correctly. They do not provide any guarantees as to the authenticity of the file.

I don't understand this part: 'They do not provide any guarantees as to the authenticity of the file.'

The checksum used is from a trusted HTTPS source (e.g. this one).

How a file can not be authentic if it match a checksum from a HTTPS trusted source?

Or do I miss something and I still need to validate with a GPG key?

Glorfindel
  • 2,235
  • 6
  • 18
  • 30
David
  • 143
  • 5
  • See https://security.stackexchange.com/questions/196648/why-verify-a-file-firmware-downloaded-online-against-a-checksum https://security.stackexchange.com/questions/189000/how-to-verify-the-checksum-of-a-downloaded-file-pgp-sha-etc https://security.stackexchange.com/questions/186185/why-dont-websites-provide-a-checksum-of-their-downloadable-files https://security.stackexchange.com/questions/43332/how-can-i-check-the-integrity-of-the-downloaded-files plus more linked by those. – dave_thompson_085 Jun 29 '20 at 02:43

4 Answers4

2

You're conflating two things, which is why you are confused about them. Let's pick both HTTPS and SHA-512 apart and then things will be much clearer.

What does a hash offer me?

A hash algorithm is a way for a program to take some arbitrarily large input and return a (usually small) fixed output. For example, a program can have several GB of data, but the output of the hash algorithm with that program as input is still 512 bit long.

This makes it very easy to check whether or not two files are identical, without checking every bit of the message itself (since that is essentially the same as transmitting the message itself again).

As such, if you downloaded a file and the hash of that file, you can calculate the hash yourself and then compare it to the file. You can be reasonably sure1 that the file is the same as the original if the hashes match.

However, this alone doesn't generate any authenticity. You can't know who generated the hash, or who published it.

What does HTTPS offer me?

HTTPS is based on TLS, which in turn uses different ciphers to satisfy confidentiality (nobody else can read your data), integrity (nobody else can modify your data) and authenticity (the source of the data is who you believe it is).

As you may expect, these ciphers use hash algorithms like SHA-1 or better yet SHA-2 to verify integrity. Public Key Cryptography such as RSA are used to authenticate the client.

So if TLS offers all of this, why might you want to still verify a download? The reasons are two-fold:

  1. Not everyone might download via HTTPS. The practice of releasing a checksum together with a file predates the widespread use of HTTPS. Furthermore, the storage media you might copy your data to might be unreliable (think optical disks, etc.), and as such having a checksum already at hand that you can assume to be genuine is a good idea.

  2. Attackers may have compromised the server. That means they could have replaced the legitimate files with malicious ones, and updated the SHA-512 checksums. However, if the releases were already signed with GPG and the GPG keys were not on the compromised webservers, everyone who would verify their download via GPG would get notified that the signature is invalid.

Do I need to do all of this?

It's up to your threat model. If you consider the chance of Apache's web servers being compromised lower than your willingness to do all of the extra steps, then just feel free to skip this step.


1 It is possible for two different inputs to generate the same output hash. In fact, since there are only 2160 different SHA-1 hashes, but there is a lot more data out there, you can be relatively sure that for any given SHA-1 hash, there are infinitely many other inputs that generate the same hash. However, finding such an input is very difficult and not feasible today.

0

Is a SHA checksum enough to verify integrity and authenticity?

No, a SHA checksum only verifies integrity.

How a file can not be authentic if it match a checksum from a HTTPS trusted source?

Just because a file is hosted on an official domain does not automatically mean that the file is the correct one. There have been examples in the past where download files have been tampered with directly on the server. In such a case, you would indeed be downloading a file from the official domain, but the file was not the original one (perhaps it was modified to include malware).

Or do I miss something and I still need to validate with a GPG key?

Yes, if you want to make sure that the file has been actually created by an entity that you trust, and not just that it has not been tampered with, then you need to validate its signature with a key.

user1301428
  • 1,927
  • 1
  • 23
  • 28
0

Because the file and it's checksum are generally available for download from the same site, a malicious attacker who replaced one could just as easily replace both.

A digital signature, on the other hand, could not be mimicked as long as the private key is carefully held.

gowenfawr
  • 71,975
  • 17
  • 161
  • 198
0

Authenticity means a confirmation that the file comes from particular person or organization. To allow the check of authenticity for instance HMAC can be used. The owner of a key can generate an HMAC. You download the file, take the key of the signer and generate an HMAC. If it matches, this not only shows that file was not modified, but also shows that it comes from particular key owner.

SHA-512 hash can only show if the files was modified or not. It does not contain any information about who has created this file.

mentallurg
  • 8,536
  • 4
  • 26
  • 41