ZFS + dedup: copy many small files fast

0

I'm running a ZFS backup server (with deduplication) to backup websites and store them with retention. To do this, I transfer all files to the backup server during the first backup, and make a copy of all those files to create one set to keep, and one set to rsync the backup of the next day with.

However, creating a local copy of all those files takes a long time, >3 hours with only 15 GB data, while transferring it from another server takes only half an hour. This is probably because the "cp" command reads one (small) file at a time and writes it to storage, which takes hours on mechanic disks and their seek times (raidZ with 3 disks).

This will probably be fixed if the copy would first read a bunch of data to memory, and then writes it, instead of doing it file per file, but how do I do that?

Evianon

Posted 2014-05-16T15:02:12.440

Reputation: 63

1I too believe snapshots could save you from needing to do the copy for backup purposes (and should be much faster). That said, the copy speed does seem slow. ZFS takes a lot of CPU because all the checksums. Dedupe takes a lot of RAM. I can't over state having gobs of RAM and adding L2ARC cache drives to the pool to improve dedupe performance. The more RAM & cache you have, the more file hashes the computer can keep in memory, the faster the file system finds the duplicate files to dedupe. (Also, if compression is set high, that would require more CPU, but that is the tradeoff for backups.) – Scott McClenning – 2014-05-18T05:26:38.790

Something else that may help is if the two servers you are transferring between both have ZFS, and you can use snapshots, you may be able to use ZFS Send and ZFS Receive to transport the snapshot(s) from one zpool to another, even between machines. In that case, rsync wouldn't be required for the transfer. – Scott McClenning – 2014-05-18T07:42:46.530

Answers

1

You are right your trouble is seek times. You should better use one of this two solutions :

  • tar to create an archive of your dataset, I guess it will be faster

or

  • use snapshot directory function of ZFS

kranteg

Posted 2014-05-16T15:02:12.440

Reputation: 225

Tar is not an option as deduplication doesn't work with tar files due to alignment etc. Using snapshots is quite a good solution actually, it will require to create a volume and stuff per backupped server, but I'll look into that soon. – Evianon – 2014-05-16T16:40:12.363