1
I was wondering today how well git ensures the integrity of its metadata and I am a bit surprised by what I encountered. I used the following simple setup for testing:
- Two working repositories, called
x
andy
- A bare repository, called
xy.git
So, initially x
and y
are pushing to and pulling from x.git
and everything works just fine. Now, let's say one of the metadata objects (.git/objects/...
) in x.git
becomes corrupted for whatever reason (choose your favorite random incident).
I was actually assuming that something will break at the next push or pull, but to my surprise, everything appeared to work fine. More commits, more pushing and pulling, no problems. The first time something was reported to be corrupted was when I tried to clone another working repository from the bare repository, leaving my clone in an unusable state.
Now I thought it is not that bad, because thanks to git's architecture, I can simply dump the bare repository in the worst case, and recreate it with all history from one of my working sets. But no. Without any notice, the corrupted file has made its way through the pulling into the working repositories, making it impossible to clone a new bare repository from them as well.
This happens not only when I start with a corrupted file in the bare repository, it is also possible to introduce a corrupted file from a working repository in a bare one this way.
Sure, one might be able to fix this by other means, but I'm still surprised (and a bit concerned) how easy it appears to be to mess up the repository for everyone working with it. Especially since the error can remain unnoticed until the next time someone tries to clone. Shouldn't there be checks against this, somewhere, somehow?
Anyone here willing to try if it is reproducible? I experimented with git version 2.7.4.
Any advice on how to check against such corruption is highly welcome.