50

Is there any way to recover from accidental deletions of an Amazon S3 Bucket?

We've got critical info in our buckets and I need to mitigate the risk of accidental or malicious deletions of the bucket itself.

I know I can sync the entire bucket locally, but this isn't too practical if my bucket size is 100GB.

Any ideas on backup strategies?

  • 1
    Here is an S3 backup strategy guide I wrote: http://eladnava.com/backing-up-your-amazon-s3-buckets-to-ec2/ – Elad Nava Oct 03 '15 at 20:41

6 Answers6

25

Another approach is to enable S3 versioning on your bucket. You can then restore deleted files etc. See the S3 documentation for how to enable this

Using third party tools like BucketExplorer makes working with versioning pretty trivial (vs calling the API yourself directly).

You can also enable multi-factor authentication delete for your S3 buckets - which makes "accidental deletion" that little bit harder ;)

More on Multi Factor Authentication Delete
More on Deleting Objects

Dario Seidl
  • 416
  • 5
  • 12
snarkyboojum
  • 101
  • 2
  • 2
15

You could use s3cmd http://s3tools.org/s3cmd

So to backup a bucket called mybucket

s3cmd mb s3://mybucket_backup
s3cmd --recursive cp s3://mybucket s3://mybucket_backup
Simpleton
  • 121
  • 1
  • 6
Ian Purton
  • 406
  • 6
  • 4
  • 3
    Is there a faster way to do this? If there are n keys in the bucket, there are at least n requests for copying plus some for listing (and probably checking the results). This may take quite a while for large buckets. – Kariem Jan 18 '12 at 00:37
  • 1
    Could you detail the backup operation when mybucket is corrupted and one needs to restore mybucket_backup? – Augustin Riedinger Sep 13 '13 at 09:35
8

One possible solution could be to just create a "backup bucket" and duplicate your sensitive info there. In theory your data is safer in S3 than in your hard drive.

Also, I'm not sure if accidental deletions are a real problem because you'll need to accidentally delete all your bucket keys before you could delete the bucket.

JAG
  • 839
  • 2
  • 8
  • 15
  • +1 since it'd be pretty hard to "accidentally" delete everything in a bucket and then subsequently delete the bucket too. –  Jun 28 '09 at 04:13
  • 11
    if you're using a tool like s3cmd, it's no harder than it is to delete an entire directory tree with `rm -rf` – jberryman Feb 24 '10 at 03:13
  • What about Amazon Glacier? Is it an option? – Tony Sep 09 '13 at 02:03
7

Another possible solution is to replicate your bucket to the Europe zone in S3. This may persist the bucket after your accidental deletion long enough to recover.

  • 1
    Bucket replication is a great option. For an extra layer of protection use cross account replication to ensure any breach of the source account doesn't result in data loss. – Gareth Oakley Apr 06 '18 at 12:34
7

This isn't a cheap solution, but if your buckets really are critical, here's how you do it: boot an Amazon EC2 instance and sync the content there periodically.

Amazon EC2 is their virtualization hosting provider. You can spin up instances of Linux, Windows, etc and run anything you want. You pay by the hour, and you get a pretty big storage space locally for that server. For example, I use the "large" size instance, which comes with 850GB of local disk space.

The cool part is that it's on the same network as S3, and you get unlimited transfers between S3 and EC2. I use the $20 Jungle Disk software on a Windows EC2 instance, which lets me access my S3 buckets as if they were local disk folders. Then I can do scheduled batch files to copy stuff out of S3 and onto my local EC2 disk space. You can automate it to keep hourly backups if you want, or if you want to gamble, set up JungleDisk (or its Linux equivalents) to sync once an hour or so. If someone deletes a file, you've got at least a few minutes to get it back from EC2. I'd recommend the regular scripted backups though - it's easy to keep a few days of backups if you're compressing them onto an 850GB volume.

This is really useful for SQL Server log shipping, but I can see how it'd accomplish your objective too.

Brent Ozar
  • 4,425
  • 17
  • 21
  • I guess you could use a micro instance and add as much EBS (Elastic Block Storage) as you needed. May be a cheaper option. – Shawn Vader Apr 29 '15 at 07:56
  • Actually you shouldn't, because the dedicated bandwidth to and from S3 depends on the size of the EC2 instance. If you want big throughput, you need a big (= $$$$) instance. My former employer found this out the hard way. – John Cowan Jan 08 '19 at 22:05
6

To modify Brent's (excellent) answer a bit; you shouldn't need to keep the instance running. Create an EC2 AMI that pulls your data down, syncs it to an EBS volume, snapshots that volume and shuts itself down.

You could keep the volume running as well by itself, but snapshotting it should be sufficient for a backup. If your custom AMI does all of this (including shutting itself down after it's done) with no interaction, then your 'backup' script just needs to 'ec2run -n 1 -t m1.small ami-' and fire-and-forget.