0

Is an effectively nested Xen Server storage repository a problem for stability?

Consider the graphic for the data path of the web server to a physical drive: (Starting on the right)

enter image description here

In the graphic Xen Server and the VM DRBD/NFS server is hosted on an internal USB flash drive. Hard drives are a local MDRAID10 array. The other VMs would be stored on a storage repo via NFS on a DRBD via a VDI attached as a block device on the MDRAID array. VMS are typically Debian. We DO NOT have a dedicated shared storage device (NAS, SAN, etc.).

If performance is not a concern (5-10 low use users for sporadic web access to a company site from the field), will there be stability concerns based on the data path being a storage repository served by another storage repository via NFS on DRBD.

TL;DR

In lieu of dual SANs with supporting networking, we are trying to use DRBD on two servers with local storage to allow for easy manual failover during an outage on the primary server. During an outage the secondary server would become the primary and the VMs could (theoretically) be instantly be fired up with very little configuration. The servers are same RAM and CPU generations even.

Xen seems to have its quirks and I forsee Xen Server having a problem that wipes out the entire server until it gets "fixed" given Xen is running EVERYTHING. I doubt we would have permanent data loss, but storage repositories disappearing happens more often than I would think based on the reading; and unless all your disaster recovery ducks are in a perfect little row, it could take a bit of time to bring things back up correctly if we had to reinstall Xen from scratch while carefully filling in the holes we were missing in our documentation as we went.

With DRBD thought, in the event of a problem we could be running on our backup server very quickly with active file and VM mirrors. Then worst case we could easily just start from scratch on the primary server if need be and not have to worry about "fixing" anything.

What is not shown are a couple more VMS to serve files, but the data would be housed on another SR, so only the VM data itself which is rather static would be served by the same path depicted in the graphic.

Damon
  • 429
  • 2
  • 11
  • 1
    https://en.wikipedia.org/wiki/W._Heath_Robinson – user9517 Jan 24 '17 at 08:42
  • And i thought it was suppose to be the opposite effect by consolidating all operations into just one server; then have a backup always on. Perspective is great. – Damon Jan 24 '17 at 08:47

1 Answers1

0

After running this setup in production for 6 months, I can say there does NOT seem to be any stability issues with the VMs running on a SR serviced from DRBD in a VM.

The biggest issue is you have two "hosts" to worry about that will affect all other VMs, Dom0 and the DRBD server effectively adding a second point of failure software wise (configuration errors, administration errors, bugs, etc). However this has not proved to be an issue thus far.

I have not ran any comparison benchmark for performance of VMs on and off DRBD although we have not had any noticable performance issues and the majority of data served is not served by DRBD; the SR on DRBD only host the VM host disks.

TL;DR

We had some extra RAM available on the HOST so I did set the DRBD server to use all remaining available RAM so it caches the data it serves.

Bringing up the secondary server is relatively easy. It involves setting the backup servers DRBDs service to primary, mounting the DRBD drive, then removing the primary servers SR and re-adding it on the secondary server from the SR on the DRBD drive, then utilizing XenServers builtin backup and restore to reassociate the metadata to the virtual disks.

This means the VMs used during an outage on the primary server are not outdated in anyway because they were actively replicated via DRBD vs a script. It is rather critical we keep the metadata up to date to make this work easily.

This is used like a RAID and the VMs are still backed up in case of some other failure or corruption.

Damon
  • 429
  • 2
  • 11