1

We've got an app where customers upload data and documents to a number of EC2 instances. We store the uploads on EBS volumes.

Occasionally our app will fail. Sometimes it's something in our app server, and sometimes it's bad EC2 hardware.

How can I recover a particular instance automatically? In other words, when an instance becomes unavailable for more than X minutes, I'd like to terminate the instance automatically, start a new one (possibly on new hardware), and attach the old EBS volume to it so the customer's data is preserved.

Is there some way to set up CloudWatch or autoscaling to do this?

Shef
  • 223
  • 2
  • 10

1 Answers1

1

How can I recover a particular instance automatically? In other words, when an instance becomes unavailable for more than X minutes, I'd like to terminate the instance automatically, start a new one (possibly on new hardware), and attach the old EBS volume to it so the customer's data is preserved.

this can be accomplished using the amazon api. basically, have a cron to take snapshots of the ebs volume every 12 hours or so.... then have nagios check the host, and upon 10 failures or so, have nagios execute a script to call the api tools. the nagios executed script could then:

1) find instanceid of host (either by ssh'ing into host and http'ing aws meta-data, or by grep'ing ec2-describe-instance)
2) terminate instance id (ec2-terminate-instance)
3) create volume from snapshot (ec2-create-volume)
4) launch new instance based on ami (ec2-run-instance)
5) attach new volume to instance (ec2-attach-volume)

aws cli api tools: http://aws.amazon.com/developertools/351

there are other issues however, such as dns, elastic ips, security groups, termination protection, and app layer service configuration that may need to be addressed. run ec2-run-instance -h for more help, or visit aws api forums...

nandoP
  • 2,001
  • 14
  • 15