7

I've been desperately trying to find a way to backup my AWS EFS file system to S3, but cannot seem to find one.

There's several EC2 instances running all having access to the mentioned EFS. In order to reduce traffic, I already tried launching a Lambda Function, which SSHs to the EFS instances and runs "aws s3 sync ...". Unfortunately SSHing from Lambda services doesn't seem like a good production ready solution.

What I've also tried was adapting DataPipeline, but launching additional instances just for backups seems like a hassle, too.

Isn't there some easy way of backing up EFS to S3?
Any suggestions appreciated.

wahtye
  • 71
  • 1
  • 1
  • 2

3 Answers3

7

Actually, I think S3 Sync is what you want. Maybe setup Cron on the EC2 instances and invoke S3 Sync that way? Are you using ECS as well? I have a Cron container that does the job pretty well. For those reading who are not familiar with AWS CLI (https://aws.amazon.com/cli/) the syntax for S3 Sync is like:

aws s3 sync /path/to/source/ s3://bucket/destination/
HBruijn
  • 72,524
  • 21
  • 127
  • 192
  • 1
    What about bursting credits? the floppy-like I/O on a large efs volume will make this sync unbearable, right? – ren.rocks Jun 26 '18 at 19:38
2
  1. Back up EFS using a tool such as Attic to create a compressed, incremental, de-duplicated backup on one EC2 instance.
  2. Use S3FS or the S3 API to upload those files to S3. Personally I use a dropbox upload script, which works fine as well.

Note that Attic runs at whatever interval you specify, but keeps only the checkpoints you specify. For example you might have daily backups, but then it only keeps monthly after the first month, and yearly after the first year. Because of this it deletes files from storage. If you don't delete the files from your repository it won't hurt, but you will use more storage than required. That's why a sync of the Attic backup files might be better than a copy.

Tim
  • 30,383
  • 6
  • 47
  • 77
  • I've read that S3FS isn't stable enough for a production environment. We'll have to think about saving to s3 directly through s3 api. – wahtye Aug 12 '16 at 07:49
  • There's also http://s3tools.org/s3cmd-sync and a host of other options. The actual AWS commands are probably going to be more reliable though. – Tim Aug 12 '16 at 17:39
  • 1
    @wahtye s3fs is indeed a little bit delicate. I use an older version of it in production to enable me to use S3 as the backing store for my ProFTPd server, but would never trust it for making backups. For that, I use my own code which is extremely pedantic and takes advantage of all the features S3 offers for ensuring data integrity, such as the `Content-MD5` upload header -- if S3 receives an upload with a payload not matching this, it outright refuses to even store the content. A sad number of libraries and utilities seem to just not bother with this, since it is technically "optional." – Michael - sqlbot Aug 20 '16 at 00:13
0

Recommended to use AWS life cycle hook. This option offers reliable controls(i.e:timeout,etc). https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html

vimal
  • 9
  • 1
  • Some detail as to how to configure the exact situation the question describes would, no doubt, be appreciated. – womble Jan 09 '20 at 12:59