1

My company recently delegated the task of recovering a fairly old (~2015) Citrix Xenserver to me. The machine had a major electrical fault which caused a hard drive failure and a collapse of the RAID. There is no backups, so the drives were sent in for data recovery, which succeeded. The machine cannot be booted up and booting the restored data from the hard drives causes a hard reset while the Citrix Xen logo shows.

The machine was configured with 2x RAID6, one of which contained the hypervisor and a storage repository and the other one as a storage repository extension, which was added later on. The extension was added to the LVM volume group of the Xen server.

I've managed to get the volumes all up and running again, all PVs working, and got myself a copy of the state.db from the hypervisor, which I searched for all VDIs attached to the original system.

Currently the machine has a fair amount of logical volumes. All of them are active. I suspected most of them to be snapshots of the ~5 virtual machines that were running on the hypervisor. Checking the entries in the Xen's db confirmed that. (there's an is_a_snapshot section in the xml)

So I know which VHD logical volumes are from what virtual machine, including the information when a snapshot was taken, and the names of the snapshot volumes.

The logical volumes are the hard drive of the virtual machines that were running, so each logical volume that is not a snapshot, i.e. is the base drive of a VM, does contain a valid partition table and can be mounted either via specifying a mount offset or using losetup to attach it to a loop device and running partx on it.

Here's the problem now: of course attaching the non-snapshot volumes contain ancient data that was before the snapshots were taken. However: The snapshot logical volumes are not recognised by LVM to be such. I can neither determine a partition table on the volumes marked as snapshot in the database, nor mount them in any way.

I've done some experiments on another test machine, creating a new VG and adding an LV. I created a partition table on that LV, formatted the first partition to ext3. I then took a snapshot of that LV and checked, if the snapshot volume contains a valid partition table with fdisk -l - and it does. So, a correctly taken snapshot from an LVM logical volume containing a partition table should still show a partition table if asked for one.

This behaviour is not found on the dead Xen machine's logical volumes that I am working on. None of the volumes marked as snapshots in the state.db return a valid partition table, neither with fdisk -l nor when trying other tools like guestfish.

I have tried using lvconvert --merge to try merging the snapshots back. However, it fails with a note about "--trackchanges" not being used in creating the snapshots. I was not able to find any information regarding this behaviour.

[TL;DR] So to put it together:

  • Dead Xen with logical volume VHD setup
  • All LVs are active and seem to be intact
  • I got a copy of the state.db that notes which LVs are virtual machines and which are snapshots of them
  • I cannot make use of the snapshot drives as LVM should normally allow me to do
  • I cannot merge the snapshots

So what are my options? Is there a way to manually merge the LVs back together, or can I just copy the content of them and concatenate the data in some way? Can I move the VG to another drive and just mount it into a new Xen and somehow put the VMs back together from the info in the state.db? Am I following a red herring and Xen is not actually taking LVM snapshots for snapshots of the VM?

A few notes on the side:

  • Every operation done by me was done on a carbon copy of the original drive's data. The original, recovered data was not tinkered on
  • The operations were done in a GMRL live environment
  • The amount of VHDs exceeds 8TB
  • The setup of the machine was not done by me and the person that originally set up the system back in 2015 was unable to restore it due to lacking knowledge about LVM

0 Answers0