6

I have some data and want to prove it's integrity during time, i.e. prove that a certain state of the data was present a a certain date.

For this reason I commit the data to a git repository I keep by myself (and at bitbucket).

Now I plan to hand over the commit hash to the person I want to prove the state of data to in regular time intervals.

So if the person wants to check the data I will provide them with the git repository and they can compare the hashes they have to my repository.

Is this enough to ensure to audit the data?

elsadek
  • 1,782
  • 2
  • 17
  • 53
Alex
  • 1,207
  • 1
  • 10
  • 9

2 Answers2

8

Not against sophisticated agents with lots of resources. SHA-1 (the hash used in git) is not considered cryptographically secure these days against collision attacks. With published methods for a full collision is estimated to cost about ~$100,000 in EC2 compute time. This is why SHA-1 SSL certificates are considered insecure.

If you ignore the possibility of SHA-1 collision attacks, if you hand someone a commit hash (and later give them access to the repository), they can use that commit hash to verify the following information (which was hashed together in a custom format to create the hash):

  1. the full sha1 hash of the current tree (the tree is essentially a file listing of a commit with file names pointing to sha-1 hashes of the data in files)
  2. the full sha1 hash of the parent commit(s) (multiple in case of merge)
  3. The name and email of the author and committer,
  4. The time stamp of the commit (according to the committer's local computer)
  5. The commit message

So if you email a person with a specific commit hash 43c5ab273a7c897dfd8cad9cef829b3a657f8491 at a certain time, they can later verify that commit was made pointing to specific tree (4c883b33dbb987e46c211f6dad52b91b208661b4) with a parent commit of cef37643eb57177ac9e4e1f3d0178bc67b381c8e. The timestamp and author/committer information will also be included, but this shouldn't be trusted as being accurate. E.g., anyone can tell their local instance of git they are whatever name/email they want and push it to a bitbucket repository they control.

Note, so if someone wants to verify that you had a repository a certain way last month, you can easily go back and edit history and create a new commit (with a new hash) that shows you had the repository that way at that time. However, if you had given them a hash to the commit of the state of the repository at that time, this would not be possible (without going through the trouble of generating hash collisions).

dr jimbob
  • 38,768
  • 8
  • 92
  • 161
  • Ignoring collisions and handing over the hashes over regularly, can I also prove the full changelog of the repository? Or only the state at a specific time? – Alex Feb 04 '17 at 21:24
  • I think the answer is yes, because the parent commit hash is hashes as well so I can iterate back until the root commit. – Alex Feb 04 '17 at 21:25
  • You could use the [Bitcoin blockchain to do secure time-stamping](https://en.wikipedia.org/wiki/Trusted_timestamping) of the hashes. – Jonathan Cross Feb 04 '17 at 21:54
  • @JonathanCross If you know who you want to show the proofs to in the future, it is easier to just show them the hashes in the first place. – kasperd Feb 06 '17 at 01:32
  • 1
    @kasperd I was referring to this part of the answer above: "The timestamp and author/committer information will also be included, but this shouldn't be trusted...". It is a way to establish the hash exited by that date. – Jonathan Cross Feb 10 '17 at 19:04
  • @JonathanCross even then, the easiest way to show that a hash existed by date X, is to show it to somebody at date X ;) – Bart van Heukelom Oct 17 '18 at 13:25
  • Hi dr jimbob, I've asked a similar question (https://security.stackexchange.com/questions/225411/how-secure-is-a-git-commit-hash-sha1) and hope you can answer that one. It requires intricate knowledge of the inner workings of git. I am not looking for a guarantee, just for practical likeliness. Note that my question is significantly different from this one. I want to entirely ignore collisions and/or SHA1 weaknesses, and solely concentrate on the git protocol and common implementation of the git client. The question comes down to: does a git client actually check a hash when it clone/fetch-es – Carlo Wood Feb 06 '20 at 13:56
1

As dr jimbob explains, git by default is not quite as secure as you'd probably like for this application. However, there is an option for this sort of situation, which is to sign your commits or tags with GPG.

If you do this, then they have a potentially much stronger guarantee that you have authored this work; you are now relying primarily on GPG and the web of trust to do verification, rather than git itself. The usual GPG things apply, e.g. they need to trust your key, you need to keep your key safe, etc.

Also note that if you sign a particular commit, you are verifying the authenticity of the entire commit history leading up to that one. In your case, this should not be a problem, since you're the only committer to the repository, but it's something to keep in mind while you work.

This may still provide you problems with proving the time. That information is stored in a commit, but you can control that (most obviously through altering your system clock). So what we have at the end of this is a good guarantee that you committed some particular text at what you claim is a certain time. If you want a guarantee that you haven't post-dated data, you will need to involve an external system that has a trusted clock.

Xiong Chiamiov
  • 9,384
  • 2
  • 34
  • 76
  • 2
    Signing a commit helps if you don't have a trusted channel to communicate the commit hash. But I don't see how it helps against an attacker who can create a SHA-1 collision on the tree hash. For the signed commit is merely a text document which contains the SHA-1 hash of the tree's root at commit time. If I can make that to collide, your signed commit will happily match my compromised tree. – 5gon12eder Feb 06 '17 at 01:37