4

I have ~700GB storage where I store ~15M files, hence average file size is ~50KB. To back it up, over night I run a simple rsync script with the following set of flags:

--archive --update --compress --numeric-ids --human-readable --stats

It takes 8+ hours for rsync to complete its job, in average there're ~1-4GB of data moved daily. It seems incredibly inefficient to me.

Can I tune my rsync script any how? I suppose my best bet is data migration to MongoDB or something similar, but there is a problem with that, because current infrastructure relies on the files being accessed as on the posix file system, migrating them to something totally different may require extra work, potentially too much work... What other best strategy might be?

NarūnasK
  • 358
  • 4
  • 16
  • 2
    You may consider migrating to the ZFS filesystem, and using snapshots and `zfs send` to move your backup snapshots around. That would be *much* faster than rsync. – EEAA Apr 24 '16 at 19:17
  • 3
    There's a number of different [Q&A's](http://serverfault.com/questions/590230/) such as [this](http://serverfault.com/questions/365103), [this](http://serverfault.com/q/18125/37681) and [this](http://serverfault.com/questions/746551) regarding rsync optimisation but you might be better off looking at a completely different alternative alternative such as [zfs replication](http://arstechnica.com/information-technology/2015/12/rsync-net-zfs-replication-to-the-cloud-is-finally-here-and-its-fast/) – HBruijn Apr 24 '16 at 19:21
  • I was thinking about `lvm` snapshots (I'm more familiar with `lvm` concepts than `zfs`). Also besides theoretical whispering, does anyone use such strategies in production? One other requirement is that files become immediately available after back up, thus I suppose I had to move snapshot over the wire, then immediately merge it to the base? – NarūnasK Apr 24 '16 at 19:29
  • 4
    I just want to point out you're doing a mirror, not a backup. If your primary file system corrupts and you don't notice immediately your second file store will be corrupted. Incremental backups would be preferred. On Unix I use Attic Backup, which does incremental, deduplicated backups, which would also probably be faster than rsync. – Tim Apr 24 '16 at 22:45
  • Instead of Attic for backups, consider Borg instead (which is a more recent fork of Attic). Borg (and Attic) are both extremely efficient at figuring out what to backup over the wire (10x faster then rsync-based solutions), plus they do compression and deduplication. – tgharold Apr 25 '16 at 13:48

1 Answers1

0

It takes that long for rsync just to analyze that many files, even though the transfer is efficient. It has to do in excess of 15M IOs, plus or minus caching. You could throw very fast storage at it, but that can be costly.

The zfs suggestion is to use block level copies in which this becomes one giant file to transfer.

The concepts also apply to lvm, although it might require more scripting as remote snapshots aren't built in. See something like lvmsync for ideas.

John Mahowald
  • 30,009
  • 1
  • 17
  • 32