does anyone have experience with using DRBD (protocol C) to sync parts of the datastores of 2 esxi hosts for disaster recovery of selected guests?
I have 2-3 guests that should be able to recover from hardware failure of the host in as little time as possible, but still with manual intervention and without losing too much data.
I'd like to build something like this:
1 DRBD VM on each of the 2 esxi hosts syncing their local SAS storage (primary/secondary, active/passive).
This mirrored storage should be attached to only 1 esxi host at a time via ISCSI or NFS and be used for those guests to make their vmdks sync to the second, "passive" esxi host. In the event of a hardware failure the 2nd esxi host should then attach the DRBD storage to power up those VMs (done manually of course).
I have found some information about doing this on the net, but what I haven't found any information for is consistency of the vmdks.
While this is of course not meant as a replacement for backups, backup tools for hypervisors usually make sure that the guests' filesystems and databases are quiesced before taking the snapshot or backup.
With this continuous sync this wouldn't be possible though. That's why I wonder if this is even worth doing.
What if the vmdks themselves get damaged because the hardware failure occurs at a bad time. I know DRBD discards writes that aren't complete, but is that sufficient to have a consistent (meaning "working" from esxi's point of view, apart from guest filesystem consistency which of course cannot be guaranteed this way) vmdk?
I hope that, in the event of a crash, a guest brought up on the second esxi could behave as if the VM just ungracefully shut down (with all the possible drawbacks this usually might have in other scenarios), but would that really be the case? Couldn't the vmdks as a whole get damaged?
Thank you very much for reading and your thoughts.
Max