0
I'm running a ZFS backup server (with deduplication) to backup websites and store them with retention. To do this, I transfer all files to the backup server during the first backup, and make a copy of all those files to create one set to keep, and one set to rsync the backup of the next day with.
However, creating a local copy of all those files takes a long time, >3 hours with only 15 GB data, while transferring it from another server takes only half an hour. This is probably because the "cp" command reads one (small) file at a time and writes it to storage, which takes hours on mechanic disks and their seek times (raidZ with 3 disks).
This will probably be fixed if the copy would first read a bunch of data to memory, and then writes it, instead of doing it file per file, but how do I do that?
1I too believe snapshots could save you from needing to do the copy for backup purposes (and should be much faster). That said, the copy speed does seem slow. ZFS takes a lot of CPU because all the checksums. Dedupe takes a lot of RAM. I can't over state having gobs of RAM and adding L2ARC cache drives to the pool to improve dedupe performance. The more RAM & cache you have, the more file hashes the computer can keep in memory, the faster the file system finds the duplicate files to dedupe. (Also, if compression is set high, that would require more CPU, but that is the tradeoff for backups.) – Scott McClenning – 2014-05-18T05:26:38.790
Something else that may help is if the two servers you are transferring between both have ZFS, and you can use snapshots, you may be able to use ZFS Send and ZFS Receive to transport the snapshot(s) from one zpool to another, even between machines. In that case, rsync wouldn't be required for the transfer. – Scott McClenning – 2014-05-18T07:42:46.530