17

Alternatively, the question could be asked: Does issuing a checksum for a file we sign anyways just duplicate work?

Use case: Firmware sent to an IoT device. We sign it, and form a separate checksum for it.

My understanding is that this is unnecessary, since the signing process guarantees data integrity and data authenticity whereas a separate checksum only guarantees data integrity. Is this correct? Can I just not send and store this additional checksum?

Related questions that don't answer this:

Additional research that I think supports the answer being yes:

kmfsousa
  • 181
  • 1
  • 6
  • 1
    By checksum, do you mean a cryptographic hash (like SHA256) or something like CRC32? – nobody May 05 '22 at 17:50
  • In our Case I mean CRC32. We also use MD5 for one thing, not sha256 anywhere though. We're always signing it though, so would doing any of this in addition bring benefit? – kmfsousa May 05 '22 at 18:42
  • 10
    CRCs do have the advantage that they are significantly less expensive to compute than verifying a digital signature. Thus, they may be useful for catching non-malicious transmission errors early on devices with limited computing power. – nobody May 05 '22 at 20:33
  • 2
    A tiny point... when you say you sign the firmware, I believe what you actually do is compute a (cryptographic) hash from that firmware file, then sign the hash. You could do the same with the checksum, checksums are just less suited to protection against tampering. See here https://security.stackexchange.com/questions/194600/checksum-vs-hash-differences-and-similarities – user1532080 May 06 '22 at 11:43
  • I might be wrong but my understanding was this: **Safety** = protection vs. **unintentional errors**, e.g. using a **checksum** is different from **Security** = protection vs. **intentional manipulation**, e.g. using **signatures**. – csstudent1418 May 08 '22 at 11:34

4 Answers4

20

I would say that you are completely correct, but also not correct at the same time.

From the IoT device's point of view, I agree that checking the signature does everything the checksum does, and more. There is no reason to send the checksum and the signature to the device; just send the signature.


However, from a human perspective, checksums are much easier to work with than signatures. For example:

  • Someone downloads the file and wants to check integrity on the downloaded file.
  • The file gets moved around within your network, and you want to check it again before pushing it to the device.

You could do these checks with signatures, but there are a number of user-friendliness obstacles:

  • You need a signature verifier that knows about the specific format of these signatures, for example Windows binaries, java signed jars, linux packages, etc, all have different signature formats and require different tools to verify.
  • You need to finnagle with importing the right root CA cert or GPG key into the signature verifier.

By comparison, checksums are much easier, you just hash the file and check that it matches the value on the vendor's website.

In addition, checksums confirm that you have the right version of the file, which signatures do not give you.


Finally, I dispute this:

a separate checksum only guarantees data-integrity.

If you are comparing your locally-computed checksum against, for example, the checksum value on the vendor's website, then the checksum is in fact authenticated by the HTTPS certificate on the website.

Mike Ounsworth
  • 57,707
  • 21
  • 150
  • 207
  • Thank you! For us it's simply a matter of having the righting tooling available to verify the checksum at each users local computer, which we can do. So the ease of checksum, is a minimal benefit over its additional overhead when it comes to needing to store and transmit it. – kmfsousa May 05 '22 at 22:59
  • 1
    "the checksum is in fact authenticated by the HTTPS certificate on the website": no it's not. Anyone with access to the web content (e.g. through some sort of remote vulnerability) could modify the checksum displayed online, and it would still be served with a valid HTTPS certificate. – larsks May 06 '22 at 02:58
  • 5
    @larsks We're arguing semantics; it might be a relatively weak form of authentication, but it's still authentication. – Mike Ounsworth May 06 '22 at 14:42
  • Aren't HTTPS certs authenticating just the domain usually? Are EV certs common out of big companies? The point is valid though IMO. – Gruber May 08 '22 at 11:06
  • Also I often cross check hashes on virustotal. Am I the only one sporting a nice tinfoil hat? – Gruber May 08 '22 at 11:09
  • @kmfsousa the checksum is not a minimal benefit, its a significant benefit to catch unintended transmission errors, especially if users need to sneaker-net your firmware to an isolated machine, or if your customers need to validate that the signature was transferred correctly. (The company I work for provides sha256 hashes of all deliverables and of the signatures for those deliverables because all of our customers use isolated networks and machines and this hashing has caught lots of copying/duplicating/transmission errors.) – Randall May 08 '22 at 15:08
  • 1
    The HTTPS cert authenticates that you're talking to the server you think you are, and the connection provides a guarantee that the file and checksum will not be tampered with during transmission. The biggest benefit of publishing checksums (more precisely, secure digests), is to allow people to verify that mirrors are serving exactly the same thing as you published. Further, a signature also requires the same secure digest, as the download itself is too large for signing algorithms. – OrangeDog May 08 '22 at 15:40
  • @OrangeDog: I've often thought it would be useful to have a protocol category for authenticated non-confidential communications, perhaps using a hash encoded within the URL (if the URL is received from an https:// connection, and the delivered content is consistent with the hash contained therein, one could trust the content without having to know or care who actually served it). Something like a two-hour video or other random-seek files could be accommodated by having the URL encode a hash of a header, which then contained e.g. a list of 3600 hashes of two-second portions of the file. – supercat Jul 01 '22 at 17:31
8

It's not uncommon to have both.

For example, developers of the Tails operating system sign their ISO's using GPG. See https://tails.boum.org/install/expert/index.en.html#verify for more info.

But, because the process of verifying a GPG signature is unfamiliar to many users, Tails also provides a web based tool to enable users to verify the the integrity of Tails ISO's using a checksum hash that is accessed from the Tails web site via HTTPS. See https://tails.boum.org/contribute/design/download_verification/ for more info. (FD, I am the developer of this tool).

Bear in mind that a digital signature, where the private signing key is stored offline, is generally more secure than a checksum hash posted on the developer's web site. If the site is hacked, the hacker can replace the ISO with a malicious file, then simply update the checksum hash posted on the site to match that of the malicious ISO. On the contrary, if the private signing key is stored offline, the attacker has no way to sign the malicious ISO using the developer's private key.

Related: What's the point of providing file checksums for verifying downloads?

mti2935
  • 19,868
  • 2
  • 45
  • 64
3

Checksums are typically easier to compute. If you're validating the signature on a given device anyways, that wont matter. But sometimes you may want to validate on one device and check for errors on a weaker one. Checksums are also simpler. If I am using a message to flash my IoT device's firmware, I may want the firmware itself to check the checksum before bricking itself because the software made an endinness error while passing the data into my firmware buffers.

Also, some checksums are more powerful than merely checking for errors. It all depends on the particular checksum, but some of them also permit correction of errors. As an example, Reed-Solomon codes can correct many small errors, something that can be difficult with signatures. Turbo Codes and Low Density Pairty Check Codes get used in environments where lots of errors can be expected and re-sending is expensive.

Cort Ammon
  • 9,206
  • 3
  • 25
  • 26
  • With Reed-Solomon you could sign the file before the error correction, and verify after reconstructing the file. – ThoriumBR May 06 '22 at 19:15
2

Checksums and signatures have different purposes.

Checksums are good for verifying data integrity. They verify that you have received what you expected to receive. I would not recommend CRC32 or MD5 for verifying file integrity - use SHA256 (sha256sum on Linux, shasum -a 256 on OSX) instead. CRC32 and MD5 both have a large enough number of collisions, and CRC32 is particularly vulnerable to brute force (we were cracking CRC32 back in 2000s to make cheats for GunZ!) that it's not reliable to verify file integrity, IMO.

Signatures are useful for verifying data integrity and that the file originated from a given source. They are useful when you need to ensure that not only was the data not modified, but it also came from a specific individual or group of individuals. Example uses would be the distribution of software from an app store - Signing the app before you send it to your app store ensures Acme App Store hasn't modified the app contents, and Acme App Store can be sure that the binary came from you, not just someone who guessed your API keys for uploading.

The caveat with signatures is that they require some shared key material, and often those keys can expire. Verifying the signature of a file is also more tricky. Checksums only require a simple file be hosted somewhere, and verifying the checksum can be done visually or by comparing two files.

If your firmware is being flashed onto an IoT device before shipping it out, I think a checksum is probably fine - although, if it's flashed on an IoT device (and you control that flashing process), you probably don't even need data integrity.

If the device is reaching out to your website directly, and youre hosting the patches, a checksum is probably fine. However, if you're using an intermediary, you probably want a signature.

You are correctly though that, if a machine is doing all the work, you probably don't need to have both a checksum and a signature, as the signature will perform the checksum for you. However, as other commenters have mentioned, it can be useful to have a checksum if humans will be involved, as verifying a sha256sum is typically a lot easier than verifying a binary is signed correctly for a human.

Dan
  • 759
  • 7
  • 17
  • CRC32 doesn't require brute force. Given a file's size, the XOR of its present and desired CRC values, and the locations of any 32 individual bits within the same ~511MB chunk of the file, one can determine which of the bits would need to be flipped to change the CRC to the desired value without even having to look at the original file. – supercat Jul 01 '22 at 17:36
  • The biggest practical problem with using CRC32 to guard a file is that if a file format contains sections which are guarded by CRC32, one can modify any or all of the bits in such a section provided one makes updates CRC32 suitably, the changes made to the section data and its local CRC will cancel out in the computation of the overall file's CRC. – supercat Jul 01 '22 at 17:39