Amazon EFS as code repository for auto-scaled EC2s

Question

tl;dr: I need to set up a fast automatic sync from EFS to multiple EC2s

I've set up an EC2 Auto-Scaling Group in AWS and I'm looking for the best way to manage code deployments to my instances, with as little service interruption as possible (preferably none) and as little scope for human error as possible (preferably none, hahaha)...

This is for a Magento website. I initially looked at storing all web content in EFS (Elastic File System), and having my EC2s mount it at boot, so there was simply one centralised codebase that they would each have access to. I quickly discovered that this was a Very Bad Idea - serving web content of a site the size of Magento over a network share is basically unworkable, and with the latency on EFS it's even worse than your average NFS share.

What I'm now trying to achieve is to have a centralised codebase in EFS, with close-to-real-time sync from there to a "local" (EBS) directory on each instance.

I tried rsync, using a "pull" approach, having each instance rsync files from EFS to itself. it was looking good at first, but it seems to gradually get slower with each scan (over an hour at last check).

I tried find, similar result.

I've experimented with fileconveyor's symlink_or_copy transporter, but that still seems slow - perhaps because for one reason or another it's failing to use inotify to discover changes and is falling back to polling.

Ultimately the goal is to allow a developer to deploy new and changed files to a single location and for those files to replicate quickly and automatically across all running instances. The developer shouldn't have to know or care how many instances are running - it's likely to vary on an hourly basis.

This answer to a similar question is pretty good, and is the approach I am currently using - one protected EC2 instance gets updated, new AMI gets created, remaining instances are killed and replacements booted up based on the new image. EFS basically becomes redundant.

But the manual intervention required is really a lot more hassle, and more prone to human error, than I can stick with long-term. I don't want to have to create a new AMI and Launch Configuration, and update the Auto Scaling Group to use that new LC, every time I do a deployment.

So... how do I sync quickly and automatically from EFS to multiple EC2s?

If I get fileconveyor working in tandem with inotify will that solve it? Or is that a wild goose chase, does anyone know?

I have mounted it via NFS - specifically NFS v4, as Amazon advise. But file access is painfully slow, slower than other NFS mounts. I gather it's because of the distributed architecture of EFS. As such, I can't use the mounted location as my webroot, and I want to copy from the mounted location to a local directory instead. But even this takes forever. That's the crux of my problem. — Doug McLean, Jun 06 '18 at 14:31
*"I gather it's because of the distributed architecture of EFS"* Not really. EFS is blazing fast until you exhaust your burst credit balance. Be sure you're familiar with that metric. — Michael - sqlbot, Jun 06 '18 at 16:53

score 2 · Answer 1 · answered Jun 06 '18 at 20:20

Here's what I'd do:

Create a "golden image" AMI that has everything place as of now. Ideally that would be set up using a combination of CloudFormation and Opsworks.
Set up AWS Code Commit to store your source code
Set up AWS Code Deploy to deploy updated source code to your instances. This means you don't have to rebuild the AMI for every source code change, it's a simple deployment. Using the golden image rather than building from scratch you get the benefit of the new instance coming up quickly with only a small delay to update the code. This is fairly trivial update so could probably be done with EC2 User Data if you want to get it done quickly.
If you want to automate building test / pre-prod environments, testing, and production deployments (with optional manual approval) you could look at AWS Code Pipeline.
You can do blue / green (gradual) using Route53 / Nginx / HAProxy or red /black (cut-over) deployments using a variety of methods.

This stuff isn't rocket science to get working, but may take a bit of time if you're not familiar with it. Once you do the automation could save a fair amount of time for testing and deployments.

From what I've seen so far, CodeDeploy looks great - but in an auto-scaling environment it seems I still need to update the AMI after each change, as any scaled-out instances will be based on the AMI and be ignorant of any deployments made since that AMI was generated, right? — Doug McLean, Jun 14 '18 at 09:26
Deploy your AMI then update your code on the instance. I'm not 100% sure how to do this, but in general ec2 instance data is one way to have an ec2 instance do things like _git pull_ after an instance starts. — Tim, Jun 14 '18 at 17:48

score 1 · Answer 2 · answered Jun 06 '18 at 15:17

AWS CodeDeploy might be an option for this. You could build an artifact from one EC2 instances, push it to S3, and roll that out using CodeDeploy. Quote:

Finally, the AWS CodeDeploy agent on each instance pulls the target revision from the specified Amazon S3 bucket or GitHub repository and, using the instructions in the AppSpec file, deploys the contents to the instance.

CodeDeploy does support Blue/Green deployments and allows revisioning and rollbacks, which might be helpful for your case.

Amazon EFS as code repository for auto-scaled EC2s

tl;dr: I need to set up a fast automatic sync from EFS to multiple EC2s

2 Answers2