How secure is a git commit hash (sha1)?

Question

Consider the following scenario:

Someone, using a good version of git, issues a git clone followed by a git checkout of some commit hash (the 40 character hexadecimal).

To clarify, assume Bob does the following on a secure machine:

$ git clone https://secureserver.com/HonestEve/project.git
$ git rev-parse HEAD

which prints

5b3469eccbd7849d760f63af8537940c97c1d1bf

Bob goes over the just checked out code and verifies that it is indeed what he wants/needs.

Then at a later date, Bob does on another secure machine (or at least, in an empty directory):

git clone https://malicious.server.com/HonestEve/project.git

or probably he uses the same url, but somehow gets connected to a malicious git server. And then proceeds with:

git checkout 5b3469eccbd7849d760f63af8537940c97c1d1bf

Then the question is, is it possible (ignoring any SHA1 collisions or weaknesses) that the resulting source tree is different from what he got the first time?

Here you may assume that the git client that Bob uses is the normal, most wide-spread git client that comes with most operating systems; some recent version. Considering how this git client functions, would an attacker be able to insert malicious code into this checkout? Note that we assume the following:

1) There are no collisions possible (any SHA1 can only be gotten with the exact same input).
2) The attacker can make the victims git client connect to their own malicious server and has full control over the server, but not over the client or the machine that the client is running on.
3) The attacker knows the commit hash that the victim will use (far) in advance.

If I am correct then the attacker will not be able to do this when the client recalculates the sha1 locally from the original input to verify that the resulting checkout is what was expected. But that seems very inefficient. The size of sha1 is very large, but only intended to avoid accidental collisions it seems(?).

Surely a git checkout <sha1> will simply checkout whatever is listed under the given sha1 without any recalculation(?). However, perhaps the client verifies the correctness of the sha1's during cloning / fetching? If so, that would be paranoia and a clear effort to stop malicious attacks like the above. I am not sure that git does this.

Can anyone shine a light on this?

I'm not quite following what you are asking. Are you saying that the victim is going to checkout some git commit with a given hash, and the attacker wants to somehow get malicious code in that particular git hash? Also, in what way is this malicious commit supposed to impact the user? — Conor Mancone, Feb 05 '20 at 19:28
*Surely a git checkout will simply checkout whatever is listed under the given sha1 without any recalculation(?).* well, that depends on the implementation. Since `git` is open source, what does the source code say? — Marcus Müller, Feb 05 '20 at 21:09
"where we assume the following? 1) There are no collisions possible" But that assumption is false. SHA-1 is broken. — Joseph Sible-Reinstate Monica, Feb 06 '20 at 01:14
@JosephSible-ReinstateMonica Just humor me. Besides, it is only broken for collisions, but not in a way that malicious code could be inserted. Anyway, I don't want SHA1's weakness to be taken into account in this discussion. — Carlo Wood, Feb 06 '20 at 13:33
@MarcusMüller You are right of course... but still git will have an intent. If it doesn't have the intent to prevent this kind of attack then it would be highly unlikely that any recalculation is done, because it would just make things slower and not fulfill any goal. I do not know if the developers of git wanted to avoid this or not. Nor do I know what gits source code does, that is why I am asking here :/. — Carlo Wood, Feb 06 '20 at 13:36
@ConorMancone Yes, that sounds about right. The code that is checked out will be compiled and used (executed). If the code isn't exactly what the victim expected, with not a single byte changed, then the attack succeeded imho. It doesn't matter what attackers intent is, that could be anything. — Carlo Wood, Feb 06 '20 at 13:39
"it is only broken for collisions, but not in a way that malicious code could be inserted" That's wrong too. — Joseph Sible-Reinstate Monica, Feb 06 '20 at 14:23
@CarloWood Am I reading correctly? You're setting aside the possibility that the attacker has carefully crafted a fake source-tree that still hashes to the original, requested value. Instead, they deliver up a fake source-tree, knowing that it will hash to some unrelated value. So essentially what you are asking is whether the git client being used to retrieve the source-tree will re-compute the hash of what was actually delivered and compare it with the hash that was asked for? Is that right? — TripeHound, Feb 06 '20 at 15:14
"is it possible (ignoring any SHA1 collisions or weaknesses)" - Well no. If you ignore the possibility of SHA1 collisions then it is not possible to get a SHA1 collision. — user253751, Feb 06 '20 at 16:47
Indeed it sounds like this really just boils down to "does git verify commit hashes on checkout?". The answer to that question will potentially vary from git client to git client, and there **are** many git clients. There isn't just one "git". Therefore, I'm marking this as "needs more details". — Conor Mancone, Feb 06 '20 at 18:31
@TripeHound Yes that is right. Although I assume that nothing is checked over and over every checkout, but rather when cloning and fetching, if at all thus. — Carlo Wood, Feb 06 '20 at 21:47
@ConorMancone In that case, could you point to any git client that does not verify it? — Carlo Wood, Feb 06 '20 at 21:47
@ConorMancone I disagree. When people refer to `git` without naming an individual client, it's a safe bet that they mean https://github.com/git/git — Joseph Sible-Reinstate Monica, Feb 06 '20 at 22:44
I just tried to hack a repository myself - just altered a blob without changing its name, and when I tried to clone that the client gave an error and didn't even create a directory. — Carlo Wood, Feb 06 '20 at 22:46
Sounds like you answered your question yourself! You can always write an answer — Conor Mancone, Feb 07 '20 at 01:17
@ConorMancone I am not convinced I did: it is possible that the server detected the problem itself and did not send the blob (or under a different name). The error that the client gave was that the blob was "missing" - aka, it did not receive it. If it was the client that detected the mismatch in hash I'd have expected a different error message. — Carlo Wood, Feb 07 '20 at 14:21
As a side note, I found this: https://git-scm.com/docs/hash-function-transition/ saying "Git v2.13.0 and later subsequently moved to a hardened SHA-1 implementation by default, which isn’t vulnerable to the SHAttered attack." This is the SHA1 library that is published here: https://github.com/cr-marcstevens/sha1collisiondetection — Carlo Wood, Feb 07 '20 at 15:52

ghost43 · Answer 1 · 2020-06-27T21:16:51.453

Bob goes over the just checked out code and verifies that it is indeed what he wants/needs.

Then at a later date, Bob does on another secure machine (or at least, in an empty directory):
git clone https://malicious.server.com/HonestEve/project.git
or probably he uses the same url, but somehow gets connected to a malicious git server. And then proceeds with:
git checkout 5b3469eccbd7849d760f63af8537940c97c1d1bf
Then the question is, is it possible (ignoring any SHA1 collisions or weaknesses) that the resulting source tree is different from what he got the first time?

Yes it is possible. This is not safe to do.

Mallory (the owner of malicious.server.com) can create a branch, named 5b3469eccbd7849d760f63af8537940c97c1d1bf, with arbitrary changes to the code/repository.

When Bob executes git checkout 5b3469eccbd7849d760f63af8537940c97c1d1bf, git will display a warning that the ref is ambiguous but will ultimately prefer the branch over the commit. (note if this gets run in e.g. a build script, you will likely miss the warning)

I have tested this just now on git 2.25.1 (on Ubuntu 20.04).

I have also tested pushing such a branch to GitHub - but that failed. GitHub seems to have a custom check that ensures that branches/tags cannot have a 40-hex character name:

[...]
remote: error: GH002: Sorry, branch or tag names consisting of 40 hex characters are not allowed.
remote: error: Invalid branch or tag name [...]

However, this check is specific to GitHub; and even pulling from GitHub note that you now not only need to trust git but GitHub too.

The solution to this specific attack vector is to tell git that the ref you want is a commit hash:

git checkout 5b3469eccbd7849d760f63af8537940c97c1d1bf^{commit}

Thanks for pointing out this vulnerability. But it doesn't answer my question on whether or not the git client recalculates sha1 and thus verifies the checked out code (in case you actually check it out by hash - ie, by adding ^{commit}). — Carlo Wood, Jun 26 '20 at 19:43
@CarloWood the client does not trust the server and will recalculate hashes on all objects it receives specifically to counter an attack like the one you describe. See this thread: https://lore.kernel.org/git/20190828234706.GB25355@sigill.intra.peff.net/ — mricon, Jun 28 '20 at 13:15

How secure is a git commit hash (sha1)?

1 Answers1

Linked