6

It's pretty clear that GPG signing and GitHub ssh keys serve different purposes

But what is the use of GPG signing commits? Is it just that you know that commit abcd1234 was signed by Alice's key? Is there anything else that GPG signing a commit is useful for?

Wayne Werner
  • 1,755
  • 3
  • 15
  • 20

1 Answers1

8

With git, a cryptographic signature at the tip of the branch provides strong integrity guarantees of the entire history of that branch going backwards, including all metadata and all contents of the repository, all the way back to the initial commit. This is possible because git records the hash of the previous commit in each next commit's metadata, creating an unbreakable cryptographic chain of records. If you can verify the cryptographic signature at the tip of the branch, you effectively verify that branch's entire history.

For example, let's take a look at linux.git, where the latest tag at the time of writing, v5.5-rc4, is signed by Linus Torvalds. (This is slightly different from your question because you asked about signing commits, while we're looking at signed tags -- but in every practical sense this is exactly the same.)

$ git cat-file -p v5.5-rc4
object fd6988496e79a6a4bdb514a4655d2920209eb85d
type commit
tag v5.5-rc4
tagger Linus Torvalds <torvalds@linux-foundation.org> 1577662169 -0800

Linux 5.5-rc4
-----BEGIN PGP SIGNATURE-----

iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4JNtkeHHRvcnZhbGRz
...
=rNsn
-----END PGP SIGNATURE-----

The entire contents of this tag are signed, so this tells us that when Linus signed the tag, the "object hash" on his system was fd6988496e79a6a4bdb514a4655d2920209eb85d.

But what exactly is that "object hash?" What are the contents that are hashed here? We can find out by asking git to tell us more about that object:

$ git cat-file -p fd6988496e79a6a4bdb514a4655d2920209eb85d
tree 30efc38ed890f113f9e1f0cbc801bab7090365eb
parent a99efa00891b66405ebd25e49868efc701fe1546
author Linus Torvalds <torvalds@linux-foundation.org> 1577662156 -0800
committer Linus Torvalds <torvalds@linux-foundation.org> 1577662156 -0800

Linux 5.5-rc4

The above contents in their entirety (slightly differently formatted) is what gives us the sha1 hash fd6988496e79a6a4bdb514a4655d2920209eb85d. So, thus far, we have unbroken cryptographic attestation from Linus's PGP signature to two other important bits about his git repository:

  • information about the state of his source code (tree)
  • information about the previous commit in the history (parent)
  • information about the author of the commit and the committer, which are the one and the same in this particular case
  • information about the date and time when the commit was made

Let's take a look a the tree line -- what contents were hashed to arrive at that checksum? Let's ask git:

$ git cat-file -p 30efc38ed890f113f9e1f0cbc801bab7090365eb
100644 blob 196ca317bd1f24ad57ad9ded549bc0b7994d8111    .clang-format
100644 blob 43967c6b20151ee126db08e24758e3c789bcb844    .cocciconfig
100644 blob a64d219137455f407a7b1f2c6b156c5575852e9e    .get_maintainer.ignore
100644 blob 4b32eaa9571e64e47b51c43537063f56b204d8b3    .gitattributes
100644 blob 72ef86a5570d28015d0ccb95ccd212bf8820c1c2    .gitignore
100644 blob a7bc8cabd157b89adbc2d8f6c4d6e3c88b14e3cd    .mailmap
100644 blob da4cb28febe66172a9fdf1a235525ae6c00cde1d    COPYING
100644 blob 9602b0fa1c958da4b605127c2b8da0628efeb34e    CREDITS
040000 tree b239b5eaacf2619bf564d152661ee425c4ee76f9    Documentation
...
040000 tree 82dd6833b421625c7b15d4d1c26964fc65874eb4    virt

This is the entirety of the top-level Linux kernel directory contents. The blob entries are sha1sum's of the actual file contents in that directory, so these are straightforward. Subdirectories are represented as other tree entries, which also consist of blob and tree records going all the way down to the last sublevel, which will only contain blobs.

So, tree 30efc38ed890f113f9e1f0cbc801bab7090365eb in the commit record is a checksum of other checksums and it allows us to verify that each and every file in linux.git is exactly the same as it was on Linus Torvalds' system when he created the commit. If any file is changed, the tree checksum would be different and the whole repository would be considered invalid, because the object hash would be different than in the commit.

Finally, if we look at the object mentioned in parent a99efa00891b66405ebd25e49868efc701fe1546, we will see that it is a hash of another commit, containing its own tree and parent records. If we cared to, we could walk each commit all the way back to the beginning of Linux git history, but we don't need to do that -- verifying the checksum of the latest commit is sufficient to provide us all the necessary assurances about the entire history of that tree.

So, if we verify the signature on the tag and confirm that it matches the key belonging to Linus Torvalds, we will have strong cryptographic assurances that the repository on our disk is exactly, byte-for-byte the same as the repository on the computer belonging to Linus Torvalds -- with all its contents and its entire history going back to the initial commit.

Since you asked specifically about signing commits as opposed to signing tags, I have to mention that it's generally a good practice to PGP-sign commits, particularly in environments where multiple people can push to the same repository branch. Signed commits provide easy forensic proof of code origins (e.g. without commit signing Alice can fake a commit to pretend that it was actually authored by Bob). It also allows for easy verification in cases where someone wants to cherry-pick specific commits into their own tree without performing a git merge.

If you are looking to get started with git and PGP signatures, I can recommend my guide here: https://github.com/lfit/itpol/blob/master/protecting-code-integrity.md

Important note: sha1 is not considered sufficiently strong for hashing purposes these days, and this is widely acknowledged by the git development community. Significant efforts are under way to migrate git to stronger cryptographic hashes, but they require careful planning and implementation in order to minimize disruption to various projects using git. To my knowledge, there are no effective attacks against sha1 as used by git, and git developers have added further precautions against sha1 collision attacks in git itself, which helps buy some time until stronger hashing implementations are ready to go.

mricon
  • 6,238
  • 22
  • 27
  • 1
    So the short answer is that gpg signing a commit guarantees that it actually comes from Alice? It may be a good idea to add a small summary at the bottom of your answer. –  Dec 30 '19 at 22:54
  • @MechMK1 well, that's not the question, though. The question was "what else" is it good for, and the short answer for that is in bold. – mricon Dec 30 '19 at 23:00