24

I have a number of Xen virtual machines running on a number of Linux servers. These VMs store their disk images in Linux LVM volumes with device names along the lines of /dev/xenVG/SERVER001OS and so on. I'd like to take regular backups of those disk images so I can restore the VMs in case we need to (the LVM devices are already mirrored with DRBD between two physical machines each, I'm just being extra paranoid here).

How do I go about this? Oviously the first step is to snapshot the LVM device, but how do I then transfer data to a backup server in the most efficient manner possible? I could simply copy the whole device, something along the lines of:

dd if=/dev/xenVG/SERVER001OS | ssh administrator@backupserver "dd of=/mnt/largeDisk/SERVER001OS.img"

...but that would take a lot of bandwidth. Is there an rsync-like tool for synching contents of whole disk blocks between remote servers? Something like:

rsync /dev/xenVG/SERVER001OS backupServer:/mnt/largeDisk/SERVER001OS.img

If I understand rsync's man page correctly, the above command won't actually work (will it?), but it shows what I'm aiming for. I understand the --devices rsync option is to copy devices themselves, not the contents of those devices. Making a local copy of the VM image before syncing it with the remote server isn't an option as there isn't the disk space.

Is there a handy utility that can synch between block devices and a backup file on a remote server? I can write one if I have to, but an existing solution would be better. Have I missed an rsync option that does this for me?

David Hicks
  • 2,258
  • 2
  • 15
  • 12

11 Answers11

19

Although there are 'write-device' and 'copy-device' patches for RSync they only work well on small images (1-2GB). RSync will spend ages searching around for matching blocks on larger images and it's almost useless of 40GB or larger devices/files.

We use the following to perform a per 1MB checksum comparison and then simply copy the content if it doesn't match. We use this to backup servers on a virtual host in the USA to a backup system in the UK, over the public internet. Very little CPU activity and snapshot performance hit is only after hours:

Create snapshot:

lvcreate -i 2 -L 25G /dev/vg_kvm/company-exchange -n company-exchange-snap1

export dev1='/dev/mapper/vg_kvm-company--exchange--snap1';
export dev2='/dev/mapper/vg_kvm-company--exchange';
export remote='root@backup.company.co.za';

Initial seeding:

dd if=$dev1 bs=100M | gzip -c -9 | ssh -i /root/.ssh/rsync_rsa $remote "gzip -dc | dd of=$dev2"

Incremental nightly backup (only sends changed blocks):

ssh -i /root/.ssh/rsync_rsa $remote "
  perl -'MDigest::MD5 md5' -ne 'BEGIN{\$/=\1024};print md5(\$_)' $dev2 | lzop -c" |
  lzop -dc | perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\1024};$b=md5($_);
    read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 | lzop -c |
ssh -i /root/.ssh/rsync_rsa $remote "lzop -dc |
  perl -ne 'BEGIN{\$/=\1} if (\$_ eq\"s\") {\$s++} else {if (\$s) {
    seek STDOUT,\$s*1024,1; \$s=0}; read ARGV,\$buf,1024; print \$buf}' 1<> $dev2"

Remove snapshot:

lvremove -f company-exchange-snap1
sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • I was scared at first but then tried it out and it really works. – Martin Aug 27 '13 at 16:04
  • Why `read ARGV,$buf,1024` instead of `read STDIN,$buf,1024`, @sysadmin1138 ? (I am trying to answer to http://stackoverflow.com/q/22693823/2987828 and do no understand ARGV here). I use everyday the variant in the question http://stackoverflow.com/q/22693823/2987828 and it works well. – user2987828 Apr 07 '14 at 14:36
  • 2
    see http://www.perlmonks.org/bare/?node_id=492858 which says that ARGV and STDIN are similar unless a filename is given as argument. – user2987828 Apr 07 '14 at 14:52
12

Standard rsync is missing this feature, but there is a patch for it in the rsync-patches tarball (copy-devices.diff) which can be downloaded from http://rsync.samba.org/ftp/rsync/ After appling and recompiling, you can rsync devices with the --copy-devices option.

Balázs Pozsár
  • 2,085
  • 1
  • 14
  • 16
11

People interested in doing this specifically with LVM snapshots might like my lvmsync tool, which reads the list of changed blocks in a snapshot and sends just those changes.

womble
  • 95,029
  • 29
  • 173
  • 228
6

Take a look at Zumastor Linux Storage Project it implements "snapshot" backup using binary "rsync" via the ddsnap tool.

From the man-page:

ddsnap provides block device replication given a block level snapshot facility capable of holding multiple simultaneous snapshots efficiently. ddsnap can generate a list of snapshot chunks that differ between two snapshots, then send that difference over the wire. On a downstream server, write the updated data to a snapshotted block device.

STW
  • 960
  • 1
  • 7
  • 24
rkthkr
  • 8,503
  • 26
  • 38
3

There's a python script called blocksync which is a simple way to synchronize two block devices over a network via ssh, only transferring the changes.

  • Copy blocksync.py to the home directory on the remote host
  • Make sure your remote user can either sudo or is root itself
  • Make sure your local user (root?) can read the source device & ssh to the remote host
  • Invoke: python blocksync.py /dev/source user@remotehost /dev/dest

I've recently hacked on it to clean it up and change it to use the same fast-checksum algorithm as rsync (Adler-32).

rcoup
  • 167
  • 1
  • 8
  • 1
    I'm using it, works fine. Note there is a [modified version](https://gist.github.com/geraldh/3296554) that fixes a possible source of corruption and uses a more reliable hash. – cmc Mar 08 '13 at 17:07
1

Just beware that the performance of a system that has LVM snapshots is proportional to the number of snapshots.

For example Mysql performance with lvm snapshots

Andrew
  • 7,772
  • 3
  • 34
  • 43
James
  • 2,212
  • 1
  • 13
  • 19
  • Indeed - my initial solution involved simply setting a daily snapshot then doing a diff with the previous day's snapshot and dd-ing it over to the backup server. I was most peeved to find out it wouldn't be that simple. – David Hicks Jun 25 '09 at 19:11
  • That may not be true with LVM thin snapshots which are implemented much differently – Alex F Feb 13 '18 at 13:58
1

If you're trying to minimize the amount empty space you'd send across the wire with a plain dd, could you not just pipe it to gzip before piping it to ssh?

e.g. dd if=/dev/xenVG/SERVER001OS | gzip | ssh administrator@backupserver "dd of=/mnt/largeDisk/SERVER001OS.img.gz"

Ophidian
  • 2,158
  • 13
  • 14
  • It'd cut down the bandwidth needed a bit, but we've got some 60 and 100 GB disk images and even with gzip it'd take too long. – David Hicks Jun 25 '09 at 19:12
  • 1
    @Ophidian, you should know that SSH handles compression internally, there's an option. – poige Oct 29 '11 at 04:32
1

This is an old question, but nobody mentioned two very useful tools to efficiently synchronize two block devices:

I strongly suggest to play with both tools and to select whichever better adapt to your intended usage.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
1

After searching for several years, I recently created a tool for synchronising LVM snapshots between servers. It is designed to use minimal IO and allow the systems to run while the synchronsation is happening.

It is similar to ZFS send / receive in that in synchronises the differences between LVM snapshots, and uses thin provisioning so that performance impact is minimal.

I would like feedback, so please have a look.

David B
  • 41
  • 2
0

In addition to David Herselman's answer - the following script will sync to a local device:

perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\1024};print md5($_)' $dev2 |
  perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\1024};$b=md5($_);
    read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 |
   perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) {
    seek STDOUT,$s*1024,1; $s=0}; read ARGV,$buf,1024; print $buf}' 1<> $dev2

As far as I know both scripts were first posted at lists.samba.org.

Martin
  • 260
  • 1
  • 5
  • 11
-1

There were a few efficiencies to be made to this script:

  1. On my system at least, the perl buffer reads are 8k, so use 8192 block size.
  2. autoflush so the local end doesn't block until the remote output buffer is 'full', as we are feeding lzop the buffering seems pointless.

ssh -i /root/.ssh/rsync_rsa $remote " perl -'MDigest::MD5 md5' -ne 'BEGIN{$|=1;\$/=\892};print md5(\$)' $dev2 | lzop -c" | lzop -dc | perl -'MDigest::MD5 md5' -ne 'BEGIN{$|=1;$/=\8192};$b=md5($); read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 | lzop -c | ssh -i /root/.ssh/rsync_rsa $remote "lzop -dc |
perl -ne 'BEGIN{\$/=\1} if (\$_ eq\"s\") {\$s++} else {if (\$s) { seek STDOUT,\$s*8192,1; \$s=0}; read ARGV,\$buf,8192; print \$buf}' 1<> $dev2"

Mike Mestnik
  • 127
  • 1
  • 6