Restoring logical volume from GCP/GCE snapshot

Question

I recently carried out a recovery test on a Google Compute Engine VM with a Centos 7 OS. The VM has five partitioned disks managed by LVM. The server has been around for a while so there are about ten file systems distributed over the disks. The snapshots were all created within one or two seconds of each other in the early hours of the morning.

It would appear that the recovery went well, df output looks like what I would expect.

Was I just lucky?

I would normally expect to have trouble when restoring from ostensibly unsynchronized snapshots.

I'm wondering: do I need to use a "proper" logical-volume-aware backup system to ensure the consistency of restored file systems? Or failing that, should I ensure that each file systems is on its own single disk? Or am I worrying unnecessarily?

I have 30 years designing storage systems, host bus adapters, backup software, etc. for very large systems. Yes, you were lucky. I do not recommend LVM in the cloud unless necessary. I do not recommend taking snapshots of running systems. Features such as creating an image or taking a disk snapshot is more complex. The key features of LVM, for the most part, are not necessary in the cloud. — John Hanley, Jan 10 '20 at 00:12

score 1 · Accepted Answer · answered Jan 09 '20 at 14:36

GCP recommends to have each file system on it's own single disk.

You can save time and get the best performance if you format your persistent disks with a single file system and no partition tables.

From the standard sysadmin perspective it can be seen as a waste of resources considering that there is a limit to the amount of disks you can attach to an instance, however it is easier to manage backups.

Instances with shared-core machine types are limited to a maximum of 16 persistent disks.
For custom machine types or predefined machine types that have a minimum of 1 vCPU, you can attach up to 128 persistent disks.
Each persistent disk can be up to 64 TB in size, so there is no need to manage arrays of disks to create large logical volumes. Each instance can attach only a limited amount of total persistent disk space and a limited number of individual persistent disks. Predefined machine types and custom machine types have the same persistent disk limits.
Most instances can have up to 128 persistent disks and up to 257 TB of total persistent disk space attached. Total persistent disk space for an instance includes the size of the boot disk.
Shared-core machine types are limited to 16 persistent disks and 3 TB of total persistent disk space.

If I understand correctly you attached 5 individual disks to your instanced and proceeded to create PV,VG,LV and FS. Finally you took snapshots for each disk almost at the same time

It seems you did not make changes in your disks between the snapshots but I do not recommend this architecture for sensitive applications such as a DB as I would be worried about data consistency. I would not surprise me if you repeat the experiment with a longer time between the snapshots of your disks and experience data consistency issues.

I recommend you to take a look at snapshot best practices and scheduled snapshots

Thanks John, Ernesto. I agree, I was lucky. I'm moving to a one filesystem per disk architecture as Ernesto suggested, and in fact LVM is a help as it means I don't have to inflict downtime on my users while moving all extents of individual file systems to their own dedicated non-partitioned disks. — Peter Evans, Jan 17 '20 at 16:25

Restoring logical volume from GCP/GCE snapshot

1 Answers1