4

With many large Xen sparse image files that have to be backed up, I'm looking for a way to do this efficiently both in terms of disk space (saving space with a differential tool like duplicity, bup, rsync, womble's lvmsync, etc.) and disk/network bandwidth. Unfortunately, the space and bandwidth requirement nixes the tools I just mentioned as they will scan files' entire contents to find deltas between source and destination.

So, I want to avoid the following pitfalls:

  • Blind copying of entire files

  • Intensive scanning of entire files to generate checksums for comparison

  • Redundant copies of data on the same volume (due to COW or other feature) -- and this must hold for both the source and destination volumes.

  • Significant performance degradation during normal system use

A search did lead me to one cool example that meets all of the above criteria... OS X Time Machine when using sparsebundles as the source volume. Oh well, that's not going to work on Linux. But it was interesting to see how simply the mtimes of the individual 'band' files in a sparsebundle tell you which bits have been modified since the last backup date -- instantly, with almost zero effort. The space savings is not perfect since the bands are 4MB long, but still very good.

Eventually, while working with thin-provisioned logical volumes I came across the thin provisioning test suite which includes an example of using thinp allocation data for quick differential backups. I thought I had found my solution... just put the images on a thin LV and use snapshots!

But then I realized this would use too much space on the source volume as well as slow it down. Normal LVs are for short-term use.

I still got to wondering if there is some script-fu or crafty config options that could make a snapshot logical volume act like a 'phantom' snapshot: It would record only the block allocation data associated with modified blocks, but not copy-on-write the data blocks themselves. This phantom snapshot could be read by a backup script and instantly pick out the modified blocks for a given LV. (I guess some would call this a journal.) When the backup finished successfully, it could delete the existing phantom snapshot and create a new one to hold modification info for the next differential backup.

The solution doesn't have to involve LVM, but thinking of a solution this way allows me to present the desired solution in a more solid form. There has to be some way to reach this level of backup efficiency on Linux.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
cprise
  • 41
  • 1
  • That sounds like changed block tracking - I'm not up to speed on this in the Xen world but google throws up some references. – Paul Haldane Dec 13 '14 at 12:03
  • CBT appears to be a feature specific to VMware ESX. I need a solution that can work with regular Linux distributions. – cprise Dec 13 '14 at 16:29

0 Answers0