I have some massive Cassandra clusters that I'm maintaining on EC2. Backups offsite take a long time as the snapshots have to be tar-ed and gzip-ed and pushed over the network from the EC2 instance to EBS.
My question is whether we can decrease backup times by using cp and rsync. Let's say EBS has a previous backup. Could we copy that backup within EBS, avoiding network, and then just rsync the differences to the current snapshots to create the new backup?
Thoughts? One issue would be that all of our column families are snappy compressed. Is snappy rsyncable? And would tar-ing all those SSTables and gzip-ing with --rsyncable lead to an archive that is ultimately rsync friendly?