Git corrupting Hyper-V VHDX

Question

Hi I have created some Hyper-V machines (2 Ubuntu and one windows), exported them, and added them to a remote git repository.

When I clone the repo on another machine, I can import and run the Linux VMs fine, but the windows machine will not run, giving me an error:

Microsoft Emulated IDE Controller (instance ID ....) Failed to power on with Error 'The File or Directory is corrupted and unreadable'

Does anyone know why this is happening, or have any suggestions? I have merged my differencing disk to the parent, and deleted all snapshots too.

Git isn't designed to deal with huge files like this. What problem are you trying to solve by doing this? Did you take before/after checksums and verify them? — EEAA, Mar 08 '16 at 13:25
they are clearly different as the file size has dropped. I want to create a version controlled server image, so I can instantiate many versions of it (it is part of a cluster) consistently across many environments. — Mark Jones, Mar 08 '16 at 13:26
Git isn't your tool here. Just store the image file somewhere along with a SHA2, then use that in your application. You can still create versions images using different file names. — EEAA, Mar 08 '16 at 13:30
Ok, maybe git is not the tool, but nor is the file system and maintaining different names. I thought it was recommended practice to version control everything. What Version Control software are others using for their machine images? — Mark Jones, Mar 08 '16 at 13:38
Let me guess, you're a software developer who is getting into sysadmin? When all you have is a hammer, everything looks like a nail. Git is *great* for version controlling clear-text, smallish files that can be diffed in a meaningful way. You can't do that with binary files. Every single system I've worked with keeps its disk images on the filesystem except for EC2, which keeps them in S3. — EEAA, Mar 08 '16 at 13:43
Ha! No. I wanted to version control VMs and looked to the most obvious tool. so I was wrong. Move on and try something else. I have done my share of sysadmin in the past. I would say I am more rusty than just getting into it. I still want to version control my VMs properly, and I don't care if the mechanism for doing that is backed by a file system or a DB or whatever else. — Mark Jones, Mar 08 '16 at 14:22
This is the sort of thing you use snapshots for. But git? I can't comprehend how such an idea would arise to begin with. It's hardly an "obvious" tool for this. — Michael Hampton, Mar 08 '16 at 16:08
Just a note, Versionning a VM for ever, even with snapshot would be a error if you never merge your file. A example of what I mean http://serverfault.com/questions/430138/why-vm-snapshots-are-affecting-performance — yagmoth555, Mar 10 '16 at 11:59
Unless you're actually trying to version swap space or something else ridiculously machine-specific, you **should** be version-controlling configuration by version controlling configuration-management files (e.g. ansible playbooks, puppet manifests, etc.), and version-controlling data by backing up only those files which are relevant (e.g. database dumps). Most hypervisors support snapshots, which can be useful for development, but realistically you want to be using `packer` for base boxen in production. — Parthian Shot, Mar 14 '16 at 14:07
Ok, so it was a bad idea. TBH, I was approaching it from the Jez Humble "version control everything" stance. I am now looking a FOG or 5nine as a Hyper-V VM management solution. What I wanted what a VM template of a docker host, that I could simply spin up to add more nodes to my docker swarm. We do not have system Centre, and I was looking for a fast easy low cost solution to manage the updates to the templates. — Mark Jones, Mar 14 '16 at 14:37

score 3 · Answer 1 · answered Mar 08 '16 at 22:24

Git's really not appropriate for virtual machine disk images - GitHub has even written a storage service that will act as a sort of proxy for large files, just to keep them out of your git repo, because git is so bad for big binary files. The fact that git's delta mechanism focuses on plaintext content, and git repos contain full history (with each version of the binary, generally without any delta compression), make them get out of hand quickly when they're storing big binaries.

Instead, you should be using a filesystem that supports snapshotting your disk images (where only changed blocks will consume storage), such as ZFS or LVM with snapshots underneath something else - or, as mentioned in comments above, using a snapshotting mechanism built into your virtualization solution.

Git corrupting Hyper-V VHDX

1 Answers1