42

I am interested in using Amazon S3 to backup our ~ 100gb server images (created via Acronis backup tools)

Obviously, this uploading to S3 every night would be expensive, in terms of bandwidth and cost. I'm considering using rsync with S3 and came across s3rsync. I was just wondering if anybody had any experience using this, or any other utility?

Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
alex
  • 1,710
  • 15
  • 43
  • 63
  • 1
    One thing I noticed about s3rsync is that you are currently limited to 10GB bucket sizes (check the FAQ). You can have multiple buckets, but you have to split your data into 10GB chunks. – dana Jun 30 '11 at 01:37

7 Answers7

36

I recently stumbled across this thread on Google and it looks like the landscape has changed a bit since the question was asked. Most of the solutions suggested here are either no longer maintained or have turned commercial.

After some frustrations working with FUSE and some of the other solutions out there, I decided to write my own command-line rsync "clone" for S3 and Google Storage using Python.

You can check out the project on GitHub: http://github.com/seedifferently/boto_rsync

Another project which I was recently made aware of is "duplicity." It looks a little more elaborate and it can be found here: http://duplicity.nongnu.org/

Hope this helps.

UPDATE

The Python team at AWS has been working hard on a boto-based CLI project for their cloud services. Among the tools included is an interface for S3 which duplicates (and in many ways supersedes) most of the functionality provided by boto-rsync:

https://github.com/aws/aws-cli

In particular, the sync command can be configured to function almost exactly like rsync:

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Seth
  • 469
  • 4
  • 3
  • great contribution! thanks and I will give your code a shot soon. Do you have any must-reads for learning python/django? Cheers – iainlbc Jan 22 '12 at 03:13
  • What advantages / differences does your program have compared to S3cmd and S3sync? – James McMahon May 03 '12 at 03:00
  • @JamesMcMahon s3cmd/s3sync are more full-featured CLIs for S3 (manage buckets, list contents, etc), while boto-rsync is strictly an attempt at mimicking rsync. – Seth Jun 19 '12 at 17:10
  • There is a s3fs fuse : https://github.com/s3fs-fuse/s3fs-fuse which works pretty great and can be combined with rsync however I am not sure how efficiently. – Stanislav Apr 08 '17 at 16:38
  • It would be awesome if you can explain how "the sync command can be configured to function almost exactly like rsync". – trusktr Oct 11 '18 at 02:23
  • I would like to use this command to copy a filesystem entirely. which flags would I need to use for that purpose? Is it possible to tunnel the result to a `tar.gz' destination? Thanks in advance :) – Oleg Belousov Nov 17 '19 at 11:36
11

I've also had good luck with S3cmd and S3sync, both of which are free.

Terrell
  • 231
  • 1
  • 4
  • S3cmd has an issue with large filenumbers (> 300k files).. It eats about 1gig per 100k files of working memory so good to keep in mind that limitation.. – Tuxie Oct 22 '18 at 11:49
7

Depending on how your Acronis images are created, I'm not sure any kind of rsync would save you bandwidth. Acronis images are single file(s), so rsync wouldn't be able to read inside them to only back up what changed. Also not sure what kind of server images you're creating, but since you said 100GB I'm going to assume full? An incremental image would cut down on the nightly image size greatly, thus saving bandwidth. You could also consider saving the images to an alternate location than S3, such as tape media, and store that off-site.

churnd
  • 3,977
  • 5
  • 33
  • 41
  • 5
    No, rsync doesn't work like that. It works with any file type and doesn't need any knowledge of the internals of the file its syncing. Instead it compares hashes of chunks of the file and transfers only those chunks that differ. http://en.wikipedia.org/wiki/Rsync – Alan Donnelly Dec 12 '10 at 02:12
  • 2
    and none of the chucks will match because any small change in the files inside the image will cause the entire file to change due to compression. Even with compression turned off I'm not sure it would rsync well because the files inside the image can change order and it matches on a rolling basis, rather than just finding any chunk the same. – JamesRyan Oct 13 '11 at 11:34
4

I Never tried S3rsync.

I'm using duplicity for our off-site backups. It supports incremental backups on S3 though it is not really saving bandwidth due to Amazon S3 storage protocol in which any file modification forces you to upload the whole new file again. Anyway duplicity only uploads differences from the last incremental backup.

With Duplicity you won't need to go through another server as S3sync does, nonetheless if you encrypt your data it should be worth to give S3sync a try.

Lessfoe
  • 59
  • 4
1

You can try minio client aka "mc". mc provides minimal tools to work with Amazon S3 compatible cloud storage and filesystems.

mc implements the following commands

  ls        List files and folders.
  mb        Make a bucket or folder.
  cat       Display contents of a file.
  pipe      Write contents of stdin to one or more targets. When no target is specified, it writes to stdout.
  share     Generate URL for sharing.
  cp        Copy one or more objects to a target.
  mirror    Mirror folders recursively from a single source to many destinations.
  diff      Compute differences between two folders.
  rm        Remove file or bucket [WARNING: Use with care].
  access    Manage bucket access permissions.
  session   Manage saved sessions of cp and mirror operations.
  config    Manage configuration file.
  update    Check for a new software update.
  version   Print version.

You can use mirror command to do your operation. "localdir" being local directory & S3[alias for Amazon S3] and "remoteDir" name of your bucket on S3.

$ mc mirror localdir/ S3/remoteDir

You can also write a cronjob for the same. Also in case of network outrage you can anyways use "$mc session" to restart the upload from that particular time.

PS: I contribute to minio project & would love to get your feedback & contribution. Hope it helps.

koolhead17
  • 401
  • 3
  • 6
1

S3 also has an add-on service called AWS Import/Export that allows you to send a USB drive with your initial 100Gb data set to them and they'll load it on the S3 cloud using some backend tools at their data centers. Once your 100Gb is up there, you can just do differential backups each night to backup everything that's changed.

The site is http://aws.amazon.com/importexport/

If the majority of your data is fairly static then this would be a good option, if the whole 100Gb of data is changing daily then this is not going to help you much.

  • 3
    How do you suppose they "load" a 128Gb flash drive? I picture the world's largest usb hub, a floor to ceiling patch panel of USB connectors, 3/4 full of customer supplied flash drives, all going into the back of a single blade server. – Paul Feb 16 '10 at 19:39
  • What an image !! In reality probably some poor guy in a dark corner of a data center with your world's largest usb hub connected to his PC :) – monkeymagic Feb 25 '10 at 18:32
-1

The new Jungle Disk Server Edition (beta) might be useful to you. It has block-level de-duplication, so if your Acronis images have anything in common, this will greatly reduce the amount you need to backup. The features are perfect for server backups. Check out the release notes.

I've been testing the beta for two weeks and, aside from some small issues with the GUI that I'm sure will be fixed in the final, am excited about the product.

Martijn Heemels
  • 7,438
  • 6
  • 39
  • 62