Linux server sync to an Amazon S3 bucket

Question

I am looking for a stable solution to replace a classic server backup to another server using rsync. I have to sync a whole filesystem (more than 1Tb) to Amazon S3.

Where am I?

Solution 1: I mapped the S3 bucket to a mounting point in the system using s3fs. System gets unstable and traffic is really slow. This is no way a solution.

Solution 2: Using s3cmd sync command. Everything goes smooth at good speeds (at least for less than 2Gb folders). The problem comes when I try to sync all the filesystem on the server (with some exclusions). The process just hangs.

Any hints?

Cheap and good storage place. As long as you don't need all your files restored (that's the moment it gets expensive). Seems like a good idea, but, technically, I have this problem. The backup is daily and must be incremental (this is why I am looking for rsync behavior). — deadtired, Aug 08 '13 at 14:24
Look at the duplicity backup app, which has great S3 support. — ceejayoz, Aug 08 '13 at 14:35
Thanks for the hint. So far, so good. Works alright on folders under 1Gb. I will test tomorrow the heavy ones. — deadtired, Aug 08 '13 at 17:08
Duplicity worked fine, but because all the packing and archiving and everything that defines it as a good tool, it becomes really slow and I cannot use it to keep 2T synced to S3. Still, I recommend it for smaller amounts of data to be synced. — deadtired, Oct 15 '13 at 08:00

score 2 · Answer 1 · answered Aug 08 '13 at 14:22

This is a bad way to do backups. You should be separating your OS configuration from your valuable data. None of your permissions will be transferred, which in the Linux world are a necessity if you're planning on restoring backups (which you should be - backups without verified restorations are pointless).

Firstly, you can synchronise your valuable instance data (e.g. /var/www) to S3 using s3cmd sync as you've stated.

Secondly, using a configuration management utility such as Puppet or Chef, you can spin up a new instance of your OS with minimal effort, ensuring a fresh and reliable set of configurations.

There's no details of your underlying architecture in your question (EC2? VMware? KVM? Xen? Physical hardware?) so I can't recommend any specific tools (i.e. architecture-specific snapshotting). If you're running on a virtual platform (e.g. EC2, VMware, KVM) you should be using that platform's snapshotting architecture.

/var/www would be 99,9% of the whole data we keep. And because there is one major project, file permissions are not really the problem. System is a CentOS on 64b on a dedicated server. — deadtired, Aug 08 '13 at 14:30
`s3cmd sync --delete-removed /var/ s3://bucket/var/` is what I am trying to fix. Stays like this for hours. — deadtired, Aug 08 '13 at 14:37

Linux server sync to an Amazon S3 bucket

1 Answers1