0

I have some massive Cassandra clusters that I'm maintaining on EC2. Backups offsite take a long time as the snapshots have to be tar-ed and gzip-ed and pushed over the network from the EC2 instance to EBS.

My question is whether we can decrease backup times by using cp and rsync. Let's say EBS has a previous backup. Could we copy that backup within EBS, avoiding network, and then just rsync the differences to the current snapshots to create the new backup?

Thoughts? One issue would be that all of our column families are snappy compressed. Is snappy rsyncable? And would tar-ing all those SSTables and gzip-ing with --rsyncable lead to an archive that is ultimately rsync friendly?

jennykwan
  • 141
  • 7

1 Answers1

0

Yes, you can use rsync. In fact, we're using this backup strategy successfully with our 10 node cluster.

Let me just first state that I do not recommend running Cassandra on EBS. It's a nightmare. Backups on EBS are fine though.

We have an EBS volume attached to each instance. When we want to run a backup, we simple snapshot and rsync the snapshot to the EBS volume. Don't bother using tar or trying to compress the files, they're already compressed. When the rsync is finished, take an EBS snapshot of the disk. It's very fast and lets you copy your backups to another location at your leisure.

Jon Haddad
  • 1,332
  • 3
  • 13
  • 20