0

My understanding of LVM snapshots is that: when a block changes on the origin, the block is first copied (prior to the change) to the snapshot. The block is then changed on the origin as it would normally.

Theoretically, then, when a snapshot is first created, it should contain nothing. then as changes happen on the origin, these blocks start to fill up the snapshot.

However, if for testing purposes I create a small 1G LV, and I put a couple files on it; then I create a 200MB snapshot. As expected, lvs shows LSize of 200M for snap and 1G for origin. But when I mount the snapshot volume, df shows it with a size of 1G. When I examine its contents, I find it contains everything in the origin volume, even though the original files have not been changed.

Why is this? I'm assuming the "mount" command has some logic in it to do a comparison between the origin LV and the snapshot LV? Can someone explain how this works. If I look at the inode numbers for a file on each volume (origin and snap), they are different (as expected because each is its own unique filesystem), but I'm assuming there are some kind of "pointers" that the snapshot uses to reference unchanged blocks on the origin.

along this same vein, I assume that if I copy the snapshot logical volume itself to another location, I will get a subset of the origin's files - only those that have changed. ... But if I first mount the snapshot filesystem, and then do a fileystem-level copy (cp) I'll get everything on the origin as it was when the snapshot was created. Right?

Michael Martinez
  • 2,543
  • 3
  • 20
  • 31

2 Answers2

3

An LVM snapshot works by keeping a list of changed blocks, and their contents, in the snapshot volume, while passing all requests to read unchanged data down to the underlying block device. All of this intelligence is built into LVM, in the kernel, and mount has no knowledge of it. As far as anything in userspace is concerned, there are two block devices available, both of which are the same size (1G, in your example). The fact that one's an LVM origin volume and the other is a snapshot is of no concern to anyone except the bits of the dm (device mapper) system that have to deal with that.

The 200MB size of the snapshot is the amount of changed data that the snapshot can store, before it bursts its seams and spills blocks of data all over the floor.

If you copy the snapshot volume, either as a block device or by copying the files within it, you'll get the complete contents, either way.

All of this comes with a caveat: you can get inside the metadata in the shapshot LV and do "funky" things, like a highly efficient device-level rsync. I mention this because I've written lvmsync to do exactly this, and it sounds like you might be after this kind of functionality, from the tone of your question.

womble
  • 95,029
  • 29
  • 173
  • 228
2

Snapshots are part of the LVM subsystem, the data blocks that are an abstraction layer underneath the file system.

A snap-shot is a full copy of the original volume and will therefore also be the same size when you mount it. Both will be 1 GB.

Except for the fact that a snapshot does some trickery so you don't actually have to fully copy that whole 1 GB. In fact nothing is copied at the time you take the snapshot, which is why it is both very fast and why it initially takes hardly any space.

Instead a snapshot only starts copying data when the original data is being modified (copy on write). But then only the "about to be modified" original data (blocks) are copied and nothing else. The 200 MB space of the snapshot is the amount of space you reserve for those copies. In other words you can track 200 MB worth of modifications in your snapshot.

The trickery is that a snapshot starts out as a collection of pointers to the original data blocks. Every change in the original volume triggers a copy of that original data block to the COW table (slowly filling up the 200 MB you set apart for that) and updates the pointer accordingly. When reading the snapshot those pointers are followed and either the unmodified data from its original volume is returned by following the pointers there, or a pointer is followed to a copied block in the COW table.

HBruijn
  • 72,524
  • 21
  • 127
  • 192
  • Also a good explanation. Thank you. Between yours and the other answer, this is what I was looking for..... Interestingly, I must say, from my googling a lot of people have an incorrect understanding of it, their explanations in many cases are flat out wrong, which is why I have trouble finding a good overview. – Michael Martinez Aug 04 '15 at 17:36