Consider the following scenario:
Someone, using a good version of git, issues a git clone
followed by a git checkout
of some commit hash (the 40 character hexadecimal).
To clarify, assume Bob does the following on a secure machine:
$ git clone https://secureserver.com/HonestEve/project.git
$ git rev-parse HEAD
which prints
5b3469eccbd7849d760f63af8537940c97c1d1bf
Bob goes over the just checked out code and verifies that it is indeed what he wants/needs.
Then at a later date, Bob does on another secure machine (or at least, in an empty directory):
git clone https://malicious.server.com/HonestEve/project.git
or probably he uses the same url, but somehow gets connected to a malicious git server. And then proceeds with:
git checkout 5b3469eccbd7849d760f63af8537940c97c1d1bf
Then the question is, is it possible (ignoring any SHA1 collisions or weaknesses) that the resulting source tree is different from what he got the first time?
Here you may assume that the git client that Bob uses is the normal, most wide-spread git client that comes with most operating systems; some recent version. Considering how this git client functions, would an attacker be able to insert malicious code into this checkout? Note that we assume the following:
1) There are no collisions possible (any SHA1 can only be gotten with the exact same input).
2) The attacker can make the victims git client connect to their own malicious server and has full control over the server, but not over the client or the machine that the client is running on.
3) The attacker knows the commit hash that the victim will use (far) in advance.
If I am correct then the attacker will not be able to do this when the client recalculates the sha1 locally from the original input to verify that the resulting checkout is what was expected. But that seems very inefficient. The size of sha1 is very large, but only intended to avoid accidental collisions it seems(?).
Surely a git checkout <sha1>
will simply checkout whatever is listed under the given sha1 without any recalculation(?). However, perhaps the client verifies the correctness of the sha1's during cloning / fetching? If so, that would be paranoia and a clear effort to stop malicious attacks like the above. I am not sure that git does this.
Can anyone shine a light on this?