Best compression for ZFS send/recv

Question

I'm sending incremental ZFS snapshots over a point-to-point T1 line and we're to a point where a day's worth of snapshots can barely make it over the wire before the next backup starts. Our send/recv command is:

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 | bzip2 -c | \
ssh offsite-backup "bzcat | zfs recv -F tank/vm"

I have plenty of CPU cycles to spare. Is there a better compression algorithm or alternative method I can use to push less data over the line?

Have you verified it's actually the link that's the slowest part? Maybe it's the disk reading/writing. — kbyrd, Oct 14 '09 at 15:44
Yeah, I get 80-100 MBps connecting to the box via NFS. The network connection is 1.5 Mbps — Sysadminicus, Oct 14 '09 at 16:03
As Amuck pointed to, LZMA is currently the best general data compression algorithm widely available. — Chris S, Jun 28 '10 at 17:59
For e. g., statistics that shows that `zfs receive` can be a culprit: `received 953MB stream in 36 seconds (26.5MB/sec)` — poige, Jan 25 '17 at 11:23

score 14 · Answer 1 · answered Apr 04 '18 at 00:25

Things have changed in the years since this question was posted:

1: ZFS now supports compressed replication, just add the -c flag to the zfs send command, and blocks what were compressed on disk will remain compressed as they pass through the pipe to the other end. There may still be more compression to be gained, because the default compression in ZFS is lz4

2: The best compressor to use in this case is zstd (ZStandard), it now has an 'adaptive' mode that will change the compression level (between the 19+ levels supported, plus the new higher speed zstd-fast levels) based on the speed of the link between zfs send and zfs recv. It compresses as much as it can while keeping the queue of data waiting to go out the pipe to a minimum. If your link is fast it won't waste time compressing the data more, and if your link is slow, it will keep working to compress the data more and save you time in the end. It also supports threaded compression, so I can take advantage of multiple cores, which gzip and bzip do not, outside of special versions like pigzip.

score 10 · Answer 2 · edited Dec 10 '18 at 11:15

Here is what I've learned doing the exact same thing you are doing. I suggest using mbuffer. When testing in my environment it only helped on the receiving end, without it at all the send would slow down while the receive caught up.

Some examples: http://everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/

Homepage with options and syntax http://www.maier-komor.de/mbuffer.html

The send command from my replication script:

zfs send -i tank/pool@oldsnap tank/pool@newsnap | ssh -c arcfour remotehostip "mbuffer -s 128k -m 1G | zfs receive -F tank/pool"

this runs mbuffer on the remote host as a receive buffer so the sending runs as fast as possible. I run a 20mbit line and found that having mbuffer on the sending side as well didn't help, also my main zfs box is using all of it's ram as cache so giving even 1g to mbuffer would require me to reduce some cache sizes.

Also, and this isnt really my area of expertise, I think it's best to just let ssh do the compression. In your example I think you are using bzip and then using ssh which by default uses compression, so SSH is trying to compress a compressed stream. I ended up using arcfour as the cipher as it's the least CPU intensive and that was important for me. You may have better results with another cipher, but I'd definately suggest letting SSH do the compression (or turn off ssh compression if you really want to use something it doesn't support).

Whats really interesting is that using mbuffer when sending and receiving on localhost speeds things up as well:

zfs send tank/pool@snapshot | mbuffer -s 128k -m 4G -o - | zfs receive -F tank2/pool

I found that 4g for localhost transfers seems to be the sweetspot for me. It just goes to show that zfs send/receive doesn't really like latency or any other pauses in the stream to work best.

Just my experience, hope this helps. It took me awhile to figure all this out.

Thanks a lot for this post. Looking at zfs send more closely I very quickly got the feeling that it has a bad behaviour (aka "design") when sending to a latency-bound target. After about a dozen results telling that zfs can't possibly ever be to blame for anything. I am very grateful you took the time to look into it and posted your results. — Florian Heigl, Sep 27 '14 at 13:28

soyayix · Answer 3 · 2015-10-07T19:37:33.013

I use pbzip2 all the time (parallel bzip2) when sending over WAN. Since it is threaded you may specify the number of threads to use with the -p option. Install pbzip2 first on both sending and receiving hosts, installation instructions are at http://compression.ca/pbzip2/.

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 | pbzip2 -c | \
ssh offsite-backup "pbzip2 -dc | zfs recv -F tank/vm"

The main key is to create snapshots at frequent intervals (~10mins) to make your snapshot size smaller then send each snapshot. ssh will not resume from a broken snapshot stream so if you have a huge snapshot to send, pipe the stream to pbzip2 then split to manageable sized chunks, then rsync split files to receiving host, then pipe to zfs recv the concatenated pbzip2 files.

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 | pbzip2 -c | \
split -b 500M - /somedir/snap-inc-10-to-12.pbzip2--

this will produce files named in 500MB chunks:

/somedir/snap-inc-10-to-12.pbzip2--aa
/somedir/snap-inc-10-to-12.pbzip2--ab
/somedir/snap-inc-10-to-12.pbzip2--ac
...

rsync to receiving host multiple times (you may rsync even before zfs send completes or as soon as you see a complete 500MB chunk), press ctrl+c anytime to cancel:

while [[ true ]]; do rsync -avP /somedir/snap-inc-10-to-12.pbzip2--* offsite-backup:/somedir ; sleep 1; done;

zfs receive:

cat /somedir/snap-inc-10-to-12.pbzip2--* | pbzip2 -dc | zfs recv -Fv tank/vm

User freind mentioned: For what it's worth. I would not do a direct send | compress | decompress | receive this can lead to problems at the receiving end if the transfer line snaps and your pools will be offline for a long time during the receive. - I have encountered issues before with older zfs versions <28 in the receiving host if an ongoing send/recv is interrupted by network drops but not to the extent that the pools are offlined. That's interesting. Re-send the snapshot only if the "zfs recv" has exited in the receiving end. Kill the "zfs recv" manually if needed. zfs send/recv is much improved now in FreeBSD or Linux.

score 2 · Answer 4 · answered May 29 '13 at 22:13

My experience is that zfs send is quite bursty despite being much faster (on average) than the following compression step. My backup inserts considerable buffering after zfs send and more after gzip:

zfs send $SNAP | mbuffer $QUIET -m 100M | gzip | mbuffer -q -m 20M | gpg ... > file

In my case the output device is USB (not network) connected, but the buffering is important for a similar reason: The overall backup time is faster when the USB drive is kept 100% busy. You may not send fewer bytes overall (as you request) but you can still finish sooner. Buffering keeps the CPU-bound compression step from becoming IO-bound.

chris · Answer 5 · 2009-11-02T15:23:45.000

This is an answer to your specific question:

You can try rzip, but it works in ways that are a bit different from compress/bzip/gzip:

rzip expects to be able to read over the whole file, so it can't be run in a pipeline. This will greatly increase your local storage requirements and you won't be able to run a backup and send the backup over the wire in one single pipe. That said, the resulting files, at least according to this test, are quite a bit smaller.

If your resource constraint is your pipe, you'll be running backups 24x7 anyhow so you'll need to just be copying snapshots constantly and hoping you keep up anyhow.

Your new command would be:

remotedir=/big/filesystem/on/remote/machine/
while 
  snaploc=/some/big/filesystem/
  now=$(date +%s)
  snap=snapshot.$now.zfssnap
  test -f $snaploc/$snap
do
  sleep 1
done

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 > $snaploc/$snap &&
rzip $snaploc/$snap &&
ssh offsite-backup "
        cat > $remotedir/$snap.rzip && 
        rzip -d $remotedir/$snap.rzip && 
        zfs recv -F tank/vm < $remotedir/$snap &&
        rm $remotedir/$snap " < $snaploc/$snap &&
rm $snaploc/$snap

You will want to put better error correction in, and you'll want to consider using something like rsync to transfer the compressed files so if the transfer fails in the middle you can pick up where you left off.

score 2 · Accepted Answer · answered Nov 07 '09 at 17:54

It sounds like you've tried all of the best compression mechanisms and are still being limited by the line speed. Assuming running a faster line is out of the question, have you considered just running the backups less frequently so that they have more time to run?

Short of that, is there some kind of way to lower the amount of data being written? Without knowing your application stack its hard to say how, but just doing things like making sure apps are overwriting existing files instead of creating new ones might help. And making sure you arent saving backups of temp/cache files that you wont need.

score 1 · Answer 7 · answered May 05 '11 at 20:44

For what it's worth. I would not do a direct send | compress | decompress | receive this can lead to problems at the receiving end if the transfer line snaps and your pools will be offline for a long time during the receive. We send to a local file then gzip the snapshot and transfer using rsync (with riverbed), then we receive from the file. The riverbed doesn't optimize the traffic BUT if there is a problem with the transfer and it needs to be restarted the riverbed speeds the resend up.

We have looked at not compressing the incremental snapshot, using Rsync compression and not using any compression other than the riverbed. It's difficult to say which is best but when we are transferring archivelogs from oracle with rsync compression the transfer rate is roughly twice that of plain files and riverbed (with RSync).

If you have a riverbed then use rsync not ssh as the riverbed understands rsync and will try to optimize it and will add the data to the cache (see above, re restarting transfers).

Louis Waweru · Answer 8 · 2020-03-06T10:10:50.693

After doing zfs send tank@datasnapshot > /dev/null I realized I wasn't going to saturate my 10Gbit network anyway, so I decided to just let the backup pool to use zfs set compression=gzip.

The numbers turn out to be similar.

It's pretty rough on the CPU, but using mbuffer helps greatly.

sender: zfs send -v dank/data@snapshot | mbuffer -s 128k -m 10G -O s2dna:9090

receiver: mbuffer -s 128k -m 10G -I 9090 | zfs receive -s -v backuppool/engram/data

That large buffer is for when the CPU is busy compressing... it will occasionally get full, so this could be larger.

Overall I'm satisfied with the throughput, which was in the double digits without mbuffer:

score 0 · Answer 9 · answered Jun 19 '12 at 22:27

You will need to test with your data. Just send it to a file and compress it with each method.

For us, gzip made a huge difference and we run everything through that, but there wasn't even a 1% difference between gzip and bzip or 7z.

If you're on a slow T1, you will need to store it to a file and rsync it over.

For those (not you) who are limited a bit more by CPU than bandwidth, like lstvan said a different cipher like arcfour128 speeds things up. We use that internally when moving things around.

score 0 · Answer 10 · answered Aug 22 '12 at 16:14

0

Experiment with turning on dedup for zfs send with -D. Savings depends on how much duplication there is in your data, of course.

answered Aug 22 '12 at 16:14

James Moore

301
4
13

Since he's using `-i` which implies "incremental" backup, there's not that much hope that `-D` would give anything. – poige Jan 25 '17 at 08:47
@poige depends on what their data looks like. If they generate lots of data that has duplicate blocks, it's a big win. I don't see how -i would make it more or less likely for there to be duplicate blocks. If you normally create data that has lots of duplication, you're probably going to be creating lots of duplication inside every day, so -i doesn't help or hurt. – James Moore Jan 25 '17 at 17:42
Well, if you have plenty of duplicates any compression would take care of it anyways. – poige Jan 25 '17 at 18:01
1

@poige They have to measure against their actual data. You can definitely have datasets that compress badly and dedup really well. For example, multiple copies of the same compressed video file dedups really well, and compression at the file system level is probably worse than useless. – James Moore Mar 03 '17 at 16:39
Ah, this case — yep – poige Mar 04 '17 at 02:47

score 0 · Answer 11 · answered Oct 14 '09 at 16:49

0

You can pick up a faster cipher for ssh maybe blowfish-cbc, also try the -123456789 switches

-1 (or --fast) to -9 (or -best)

answered Oct 14 '09 at 16:49

Istvan

2,562
3
20
28

1

From the unix man page: The --fast and --best aliases are primarily for GNU gzip compatibility. In particular, --fast doesn't make things signifi- cantly faster. And --best merely selects the default behaviour. – Sysadminicus Oct 14 '09 at 17:01
1

so it has no effect in your case. What about the cipher? – Istvan Oct 14 '09 at 17:05
I've had good luck with LZMA compression, but it might be that your link is just too slow. – Amok Oct 14 '09 at 17:52

score 0 · Answer 12 · answered Nov 02 '09 at 11:56

The "best" compression algorithm depends on what type of data you have - if you are pushing a MP3 collection compression will probably slow the process down, while text/logfiles can be significantly compressed with gzip -9.

How much data are you pushing each day?

chris · Answer 13 · 2010-06-28T17:27:37.970

I'm assuming you simply can't increase the raw bandwidth of your site...

You may see benefit from not using compression on the host.

If you use something like a wan optimizer, it will be able to optimize the transfer much better if you don't compress the file before you send it, ie you do exactly what you're doing but remove the bzip2 from the pipe. After a couple runs of your backup, the wan optimizer will have cached a very large fraction of the stuff it sees in the transfer and you'll see huge improvements in transfer speeds.

If you're on a limited budge, you may be able to see a similar improvement by using rsync and rsyncing the uncompressed snapshot, ie:

zfs send -i tank/vm@2009-10-10 tank/vm@2009-10-12 > /path/to/snapshotdir/snapshotfile
rsync /path/to/snapshotdir/snapshotfile offsite-backup:/remote/path/to/snapshotfile
ssh offsite-backup 'zfs recv -F tank/vm < /remote/path/to/snapshotfile'

This would be faster because rsync would only transfer the differences between yesterday's snapshot and today's. Depending on how the snapshotting process works, there may still be lots of redundancy between the two even if they're not really the same file at all.

The wan optimizer is by far a more likely way to fix this problem (well, metro ethernet is the most likely way to solve this problem, but we'll leave that off the table). The rsync is just a wild shot in the dark that's worth testing (locally; rsync will tell you how much time it saved over a straight copy) on your local data before writing the big check for fiber or a riverbed install.

score -1 · Answer 14 · answered May 08 '14 at 19:20

Have you considered tuning your TCP/IP stack so that you're TCP buffer and window sizes are a bit bigger? you can use the ndd tool on Solaris for this or the sysctl tool on Linux/BSD/Mac OSX. On Solaris, you're looking for the /dev/tcp tcp_max_buf and /dev/tcp tcp_cwnd_max values, and on Linux sysctl, you're looking for net.ipv4.tcp_mem, net.ipv4.tcp_rmem and net.ipv4.tcp.wmem values.

Also, these links might be of some additional help:

Solaris TCP Performance Tuning

There's a set of links at the bottom of that page which will explain how to do the same for Linux/BSD/OSX as well.

1. This is a 5 year old question you're digging up. 2. He didn't say the link was underutilized and asked about compression, which you don't reference. 3. Most OSes tune the window size automatically these days. The info you link to was old 3 years ago when the author posted it. — Chris S, May 08 '14 at 19:36

Best compression for ZFS send/recv

14 Answers14