libvirt: guest performance during backup

Question

Here is a simplified version of my backup script that runs in the host:

# shutdown the guest to ensure its filesystem is in a stable state
virsh shutdown web --mode=acpi
sleep 20s # the real script uses a smarter method to wait for the guest shutdown to complete

# make a snapshot copy of the offline guest
lvcreate -n web-bsnap -L50GB -s /dev/vg0/web

# start the guest to minimize the offline time
virsh start web

# create the backup volume
lvcreate -n web-0 -L 193273528320B /dev/vg0

# make the backup by copying the offline snapshot
nice -n 19 dd if=/dev/vg0/web-bsnap of=/dev/vg0/web-0 bs=4K

# remove the snapshot
lvremove -f /dev/vg0/web-bsnap

The backup takes more than 1 hour, but the problem is that, during that time, the guest becomes very slow (at times it is unreachable too). I have no need for the backup to end in 1 hour or 2, it can take 10 hours if needed, but I want it to run at lowest priority so that it doesn't disturb the normal guest operations. The nice command is there for that reason, but it doesn't seem to make any difference.

The host system is a Debian GNU/Linux 8 amd64 with the Linux kernel from sid (4.7). The same goes for the guest. The problem was just the same with the jessie kernel (3.16) on both host and guest.

The host hardware is way oversized for the usual guest workload, with 256GB of RAM, a Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz with 6 cores and 2TB RAID1 storage on enterprise SATA disks, all for a single guest with a website that serves 1 webpage/second on average. The usual server load is below 1.

What can I do to make the backup less intrusive?

Do you really need to shutdown the guest? [Freezing it](http://serverfault.com/a/726036/126632) is usually sufficient. — Michael Hampton, Oct 29 '16 at 20:35
@MichaelHampton Hibernating would be enough, but, er... I had never considered that option... I'll look at it, thanks — Lucio Crusca, Oct 30 '16 at 08:13

score 1 · Answer 1 · answered Oct 29 '16 at 19:03

Here is the thing - nice adjusts process priority. In case of dd - this is mostly IO intensive, not CPU intensive, that's why this mechanism does not work for you.
Here is how I would do it

mkfs /dev/mapper/vg0-web-0
mkdir /mnt/websnap
mkdir /mnt/level0
mount /dev/mapper/vg0-web-0 /mnt/level0
mount /dev/mapper/vg0-web-bsnap /mnt/websnap
rsync -av --bwlimit=10000 /mnt/websnap/ /mnt/level0/

With this you will be not doing block copy (which might be inefficient by the way when your volume is not close to 100% full) but file based copy plus you get to control your bandwidth with --bwlimit

That doesn't take into account guest disk partitioning, so it actually can't work as is, but that's a good start, thanks! — Lucio Crusca, Oct 30 '16 at 08:28

score 0 · Accepted Answer · answered Nov 13 '16 at 20:12

The solution Dmitry Zayats proposed is quite intruguing, but I ended up with a different one to keep the script guest-partitioning-agnostic:

dd if=/dev/vg0/web-bsnap of=/dev/vg0/web-0 bs=4K &

DDPID=$!

DDRUNNING=1
while [ $DDRUNNING -gt 0 ] ; do
  kill -STOP $DDPID
  sleep 3.875s
  kill -CONT $DDPID
  sleep 0.125s
  DDRUNNING=$(ps -p $DDPID | grep $DDPID | wc -l)
done

That takes about 20 hours for a 180GB guest image, but it does not impact system performance in my case.

libvirt: guest performance during backup

2 Answers2