19

How viable as a backup strategy would be periodical LVM snapshots of xen domU's? Pros, cons, any gotchas?

To me it seems like the perfect solution for a fast, brainless restore. Any investigation could take place on the broken logical volume with domU successfuly running without interruption.

EDIT:

Here's where I'm at now, when doing full system backups.

  • lvm snapshot of domU disk
  • a new logical volume which size equals the snapshot size.
  • dd if=/dev/snapshot of=/dev/new_lv
  • disposing of snapshot with lvremove
  • optional verification with kpartx/mount/ls

Now I need to automate this.

Karolis T.
  • 2,709
  • 7
  • 32
  • 45

9 Answers9

36

LVM snapshots are meant to capture the filesystem in a frozen state. They are not meant to be a backup in and of themselves. They are, however, useful for obtaining backup images that are consistent because the frozen image cannot and will not change during the backup process. So while you won't use them directly to make long-term backups, they will be of great value in any backup process that you decide to use.

There are a few steps to implement a snapshot. The first is that a new logical volume has to be allocated. The purpose of this volume is to provide an area where deltas (changes) to the filesystem are recorded. This allows the original volume to continue on without disrupting any existing read/write access. The downside to this is that the snapshot area is of a finite size, which means on a system with busy writes, it can fill up rather quickly. For volumes that have significant write activity, you will want to increase the size of your snapshot to allow enough space for all changes to be recorded. If your snapshot overflows (fills up) both the snapshot will halt and be marked as unusable. Should this happen, you will want to release your snapshot so you can get the original volume back online. Once the release is complete, you'll be able to remount the volume as read/write and make the filesystem on it available.

The second thing that happens is that LVM now "swaps" the true purposes of the volumes in question. You would think that the newly allocated snapshot would be the place to look for any changes to the filesystem, after all, it's where all the writes are going to, right? No, it's the other way around. Filesystems are mounted to LVM volume names, so swapping out the name from underneath the rest of the system would be a no-no (because the snapshot uses a different name). So the solution here is simple: When you access the original volume name, it will continue to refer to the live (read/write) version of the volume you did the snapshot of. The snapshot volume you create will refer to the frozen (read-only) version of the volume you intend to back up. A little confusing at first, but it will make sense.

All of this happens in less than 2 seconds. The rest of the system doesn't even notice. Unless, of course, you don't release the snapshot before it overflows...

At some point you will want to release your snapshot to reclaim the space it occupies. Once the release is complete, the snapshot volume is released back into the volume, and the original remains.

I do not recommend pursuing this as a long-term backup strategy. You are still hosting data on the same physical drive that can fail, and recovery of your filesystem from a drive that has failed is no backup at all.

So, in a nutshell:

  • Snapshots are good for assisting backups
  • Snapshots are not, in and of themselves, a form of backup
  • Snapshots do not last forever
  • A full snapshot is not a good thing
  • Snapshots need to be released at some point
  • LVM is your friend, if you use it wisely.
Avery Payne
  • 14,326
  • 1
  • 48
  • 87
  • 6
    Also LVM snapshot performance degrades linearly - 8 snapshots 8 times the IO. – Steven Jun 13 '09 at 23:31
  • 10
    There's a few points in your description which I think are incorrect. In current versions of LVM, if a snapshot becomes full, it's simply marked as unusable and needs to be deleted. I/O on the device does not get halted. Secondly, when you delete a snapshot, no data is copied back to the original volume. Essentially, when you write to the live volume, the original blocks are first copied in to the snapshot, and then the live blocks are updated. Then when you drop the snapshot, it's just a matter of removing the entry from the device mapper. No copying required. – Kamil Kisiel Jun 22 '09 at 05:48
  • 2
    In the interest of completeness, Kamil Kisiel is correct. See: http://tldp.org/HOWTO/LVM-HOWTO/snapshotintro.html – ktower Jul 23 '10 at 16:40
  • 1
    After much grumbling at myself for being misinformed, the answer has been modified based on multiple sources of documentation and discussion. Sorry folks, my bad. – Avery Payne Aug 12 '10 at 04:58
12

LVM snapshots are great for being able to backup you server without taking it offline. As stated LVM snapshots are almost instant copies. You create them using the lvcreate command just as you would to create the LV itself, only you give it the --snapshot option and the original LV instead of the VG. For instance:

lvcreate -L <LV size> -s -n <snapshot name> /dev/<VG name>/<LV name>

This will create a snapshot of the given LV with the specified snapshot name that you can then mount and use this snapshot LV to perform your backup from without worrying about files being actively used. This is particularly helpful if you are attempting to backup an active database server.

After you are done with backing up from the snapshot you would want to remove it to reduce any additional I/O overhead or other performance issues as others have mentioned using:

lvremove /dev/<VG name>/<snapshot name>

While LVM snapshots can be invaluable in producing a reliable backup of systems like databases and such that you would normally want to shutdown to backup to avoid file contention they are not ideal for long-term operation as a quick restore.

Jeremy Bouse
  • 11,241
  • 2
  • 27
  • 40
10

Not a good idea, IMO.

The snapshots are implemented in a copy-on-write fashion so you turn every write into a read and two writes (the block you are updating to is first read from the main volume and stored in the snapshot volume before you new data is place in its place) so you will see some performance degradation if a lot of writing is common on the VMs.

Also, IIRC, if the snapshot volume gets full it is simply dropped unceremoniously. This is not good for backup purposes! So if you do try this as a backup method, be sure to make the snapshot volume big enough to handle all the changes that will happen during the useful life of the snapshot. Of course if you are aware of and monitor the size issue and the performance issue is not a problem to you, then what you suggest might make a useful addition to other backup processes you have in place.

LVM snapshots are very useful as part of a backup process (taking a snapshot, backing up the snapshot to elsewhere to ensure the backup is consistent without having to disable updates to the "real" volume, drop the snapshot afterwards), amognst other things, but are not intended as a backup facility on their own.

David Spillett
  • 22,534
  • 42
  • 66
  • Maybe I don't understand how snapshots work. The manual says that a snapshot is an *almost instant* copy of the logical volume, avoiding the need to take the system that uses it offline. From your description it would seem that a snapshot is more of branch, replica, rather than a freezed copy. Does the snapshot get updated with all the changes made in the original system after it is made? If so, I need to take the data off of it immediatly and destroy the snapshot, because it's not intended as a storage mechanism for backups? Thanks! – Karolis T. Jun 11 '09 at 12:37
  • 2
    It is a frozen copy of the volume it is created from, but only contains blocks that have changed since the snapshot was taken (hence the snapshot volume can be far smaller than the volume it is a snapshot of). If blocks are updated in the live volume then the original blocks' content is added to the snapshot's storage, so when you look at the snapshot LVM can serve the original blocks instead of the updated ones. – David Spillett Jun 11 '09 at 13:51
  • But if it's changed (the snapshot), where does this "frozen" come from? Let's say I have this scenario, a working system somehow gets corrupted over time. I have a snapshot of it when it was working correctly. Will the snapshot be a representation of the system while it was still working correctly, or will it have the changes that made the original system corrupt in the first place? Hope I'm clear enough, just want to be sure I really understand it. – Karolis T. Jun 11 '09 at 14:20
  • To understand where frozen comes from, realize that you now have two separate volumes - the original which contains the active filesystem, and the snapshot, which changes the frozen version of the filesystem. See my answer for more details. – Avery Payne Jun 11 '09 at 14:59
  • 1
    You people make it sound more complicated than it is. The snapshot stores the state of the source filesystem as it was when the snapshot was created. When the source fs changes, the snapshot doesn not change, allowing you to point your backup program to read from the snapshot instead of the source fs. Yes, a copy-on-write happens behind the screens, but the user doesn't notice this except for extra IO usage. – Martijn Heemels Jan 23 '10 at 15:21
6

You will need to ensure that the data on disk is in a consistent state before the snapshot is made. e.g. mysql may have data cached in memory that needs to forced to disk, either by dumping the database or shutting it down. See your applications manuals for details.

pgs
  • 3,471
  • 18
  • 19
5

Beneath the smart looking stuff, LVMs is actually 'just' a device mapper trick. Creating a snapshot with lvcreate is not much more than a wrapper to some dmsetup stuff. The wrapper creates a new device (the snapshot volume) from one old volume (the original lv) and a new one (the copy-on-write volume). Together with that, the original LV is renamed to -real (see below, which is dmsetup ls --tree output). This -real LV is mapped to both the snapshot volume and the original volume, so it can be used in both places. The copy-on-write volume functions as an overlay to the -real LV. The -snap LV shows you the combination of the copy-on-write volume and the -real volume. This indeed creates some performance overhead.

Volume00-snap (253:11)
 |-Volume00-snap-cow (253:13)
 |  `- (104:2)
 `-Volume00-LogVol01-real (253:12)
    `- (104:2)

Volume00-LogVol01 (253:5)
 `-Volume00-LogVol01-real (253:12)
    `- (104:2)

When removing the snapshot, again some renaming and mapping happens. Afterwards, the situation will again look something like

Volume00-LogVol01 (253:5)
 `- (104:2)

As for in howfar this is a good method of backing up stuff: it can be, if you take into account this will (1) not help for the virtual machines RAM, (2) create a performance penalty and (3) you will need to store images of the snapshot elsewhere.

VMware VCB works with snapshots as well, btw, albeit not LVM ones.

wzzrd
  • 10,269
  • 2
  • 32
  • 47
4

Even if snapshots hadn't any performance impacts, you have to understand: Snapshots are no more of a backup than a copy to another folder on the same disk.

If the disk brakes, your data and your backup is lost. Even if you assign the snapshot area to another PE in the VG, it only contains the data modified since the snapshot.

Backing up means a copy at least to a completely separate drive as a minimum requirement.

Sven
  • 97,248
  • 13
  • 177
  • 225
  • 1
    Yes, I understand that. RAID 1 is in place to protect from storage device failures, backing up to remote location - from the software corruption. I'm considering LVM snapshots as a tool for a REALLY fast restore when you don't know what the f happened and you need the system online now. Any other options, faster then restoring a domU from a LVM backup? – Karolis T. Jun 11 '09 at 14:29
3

i use such a setup for snapshots of vmware server machines and mysql databases. works fine so far. there was couple of restores - all without problems. one thing to consider - while running with snapshot lvm gets significant performance hit for i/o operations. look here. ignore the fact they talk about mysql, i/o ops are i/o ops... no matter what kind of data sits on lvm.

pQd
  • 29,561
  • 5
  • 64
  • 106
  • 1
    aha. yeah - i assume snapshot will be taken and exported to remote storage server. not left on local host. – pQd Jun 11 '09 at 10:42
2

I use lvm snapshots only to copy the DomU Lv another one in a separate Vg, where each Domain has three backup "nodes" to is disposal.

After that, the snapshot is destroyed, and the backup Lv's remain until the next round. If I have a restore to make, I just have to choose a source Lv from the backup Vg and copy it to the domain Lv.

Once in a while, a backup Lv is dumped into an image file on a separate server.

All this is automated via script, with a backup every two days and a dump every week.

I even had a "panic" mode in mind, where the Domain Lv would be restored but run from a snapshot, and reset-ed every 2 hours, to keep de site online in case of serious hacks, until a proper defence could be organized.

Berzemus
  • 1,162
  • 3
  • 11
  • 19
1

What became of the 'panic mode' line of defense idea?

NginUS
  • 468
  • 1
  • 5
  • 13