3

I'm trying to optimize the daily backup of a LVM snapshot of a large MySQL database. It works quite ok when I just cp the files (local RAID to other local RAID), with an average speed of ~100MB/s. But since the database files (600GB, most of it in two files of 350GB and 250GB) do not change very much over the course of one day, I thought it would be more efficient to only copy the changed blocks.

I'm using

rsync --safe-links --inplace -crptogx -B 8388608 /source/ /destination/

It did work, was slower than the simple copy, and I did not see any read activity on the target disk. My thought was that rsync would read (8MB) blocks from the source and the destination, compare their checksums and only copy the source block into the target file if it was changed. Am I being mistaken here? Why am I not seeing rsync read from the target in order to determine if the blocks have changed?

Here are some graphs:

Disk usage: you see that rsync --inplace (only done for the bigger file on the last day) reduced the "dent" in the disk usage of /mnt/backup, meaning that it did indeed update the existing file in place.

IO stats: the backup is made from sda to sdb. Somehow there is a huge peak in reads from the source, followed by the "normal" read(source)+write(target) activity. I was expecting simultaneous reads from both devices with little write activity on the target.

enter image description here

Stefan Seidel
  • 732
  • 1
  • 7
  • 20

2 Answers2

2

What you are probably seeing is due to the way how your files are changed and how rsync is calculating checksums. The rsync man page regarding --inplace has a basic explanation:

          o      The efficiency of rsync's delta-transfer algorithm may be
                 reduced if some data in the destination file is overwrit-
                 ten  before  it  can be copied to a position later in the
                 file.  This does not apply if  you  use  `--backup`,  since
                 rsync is smart enough to use the backup file as the basis
                 file for the transfer.

So you should probably either not use --inplace or use --backup to preserve the old copy of the file. This being said, rsync seems to handle large files rather inefficiently, so it may be not the best tool for the job.

If you are using LVM and really want to transfer snapshot data, you might not want to run rsync which is quite calculation- and I/O intensive on both sides but copy the snapshot's CoW data over to the destination machine using lvmsync instead - this would spare you the I/O and the CPU cycles at the price of a presumably larger transfer size.

Another approach to the problem would do "dumb" block device checksums (e.g. with MD5) and transfer differentiating blocks like in this answer here on ServerFault or in the blocksync.py script (I've linked the most recently active fork of it). It would not depend on snapshots at all, but obviously you would want to create one for the time of the copy to ensure that consistency of your data is maintained.

If you are concerned about your database's write performance with active snapshots, you also could take a look at ddsnap which contains several optimizations for snapshotting and volume replication, effectively working around your concerns.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • Using `--backup` is not an option, since I only have space for one copy. Not using `--inplace` means I can just use `cp` as before. And it's a InnoDB data file, so it should only have block-based changes. Also, I'm using the "momentarily snapshot" method, which means that I take a snapshot, copy the files and then remove the snapshot. Leaving the snapshot on during day-to-day operations would mean too big of a performance penalty. – Stefan Seidel Jan 22 '13 at 08:21
  • @StefanSeidel I meant to add the link to [lvmsync](https://github.com/mpalmer/lvmsync) to my post, but forgot it. I believe this would be a much better approach to your problem than using rsync. As for the rsync problem, you might want to take some additional metrics. Especially rsync itself reports how much data has been transferred over the line at the end of the transfer - can you confirm that the entire database file is going over the network? – the-wabbit Jan 22 '13 at 08:55
  • But lvmsync means that I have to keep the LVM snapshot active all the time, right? That's not possible because of performance. I have added the IO graph which shows exactly that rsync doesn't read any significant amount of data from the destination device. – Stefan Seidel Jan 22 '13 at 09:34
  • @StefanSeidel I have added some other options you might want to look at. But if you are using `cp` and not having issues with it, why would you want to replace it by something which needs more time and incurs heavy load? – the-wabbit Jan 22 '13 at 10:42
  • Thanks, I'll have a look. My thought was that the rsync inplace copy would be faster, because CPU load from checksum calculation should be relatively easy, and because reading is faster than writing, thus reading from the target (hardware RAID1) should give more than double the speed than just writing to it. – Stefan Seidel Jan 22 '13 at 12:44
0

I believe you want --inplace --no-whole-file. Notice that for local filesystems, --whole-file is assumed (see the rsync man page). See a nice little test on unix.SE. Note the comments.

Diagon
  • 236
  • 1
  • 10