3

I am trying to use AWS autoscaling lifecycle hooks in a template that encapsulates the following things:

  1. AWS::AutoScaling::AutoScalingGroup with associated scale up/down policies, launch configuration, IAM role, etc.
  2. 2 of AWS::AutoScaling::LifecycleHook for EC2 launching/terminating events.
  3. AWS::SQS::Queue (in a simplified example) where lifecycle notifications get posted.
  4. AWS::IAM::Role role for the autoscaling group to post notifications to the SQS queue.

When an ASG is launched, the queue ends up with two test notifications from the creation of lifecycle hooks, but no notifications for instance launch.

And here is the race condition.

AWS::AutoScaling::LifecycleHook object references AWS::AutoScaling::AutoScalingGroup (and hence depends on it). This dictates the order in which CloudFormation creates resources (the group is created first).

The problem is that the group starts launching instances before the hook creation is complete (instance launch is not a part of the template, so it starts executing in parallel). By the time the hook is created, there are no more events to post as the instances were already created.

Is there any way to work around it and catch launch events at stack launch time?

Alex B
  • 1,654
  • 2
  • 16
  • 29
  • Sounds like lack of foresight on the design of this. A better implementation would allow lifecycle hooks to be a part of the ASG definition instead of the ASG being a property of the hook. – Brett Jul 19 '16 at 11:24

4 Answers4

4

It's not an ideal solution, but would a two-pass stack creation be a valid workaround?

  1. Set DesiredCapacity property in the AutoScalingGroup resource to 0 on initial stack creation. This allows the LaunchConfiguration, AutoScalingGroup, and LifecycleHook resources to be created without actually launching any instances.
  2. Set DesiredCapacity to your desired count (> 0) on a subsequent stack update. This should launch your desired instances after the LifecycleHook has been created.
Will Jordan
  • 230
  • 1
  • 7
1

At some point since this question was asked, AWS added a LifecycleHookSpecificationList property to the AWS::AutoScaling::AutoScalingGroup cloudformation resource, which allows you to specify a list of lifecycle hooks as part of the definition of the ASG itself, rather than associating them later. This solves the race condition, more or less exactly the way @brett suggested in his comment.

I was running into the exact same issue when following the AWS blog post on draining ECS containers away from terminating EC2 instances (which suffers from the same race condition in this question). In my case, I was trying to define a termination hook to give ECS time to launch new containers on new instances before terminating the old ones in the ASG, but it didn't work when I was updating the ASG itself using cloudformation, since Cloudformation deleted the hook seconds before it would have been needed. Moving the hook inside the autoscaling group definition solved my issue.

mbbush
  • 81
  • 1
  • 2
1

Another workaround is to do something similar to what Will Jordan is proposing, but which can be part of the same CloudFormation Stack update:

  1. Set DesiredCapacity property in the AutoScalingGroup resource to 0 on initial stack creation. This allows the LaunchConfiguration, AutoScalingGroup, and LifecycleHook resources to be created without actually launching any instances.
  2. Create an AWS::AutoScaling::ScheduledAction dependant on the LifecycleHook with the desired sizes and Recurrence set to: * * * * *

These resources can be part of a single Cloudformation script and there is no need to perform multiple updates on the Stack.

[Edit]: Unfortunately, this approach leads to instances whenever the recurrence cron expresion is met. The AutoScaling group needs to set IgnoreUnmodifiedGroupSizeProperties to 'true' in AutoScalingScheduledAction of AutoScaling group UpdatePolicy (https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html#cfn-attributes-updatepolicy-scheduledactions).

aitorpazos
  • 43
  • 1
  • 7
0

Am not sure if this will work, but it looks like you might be able to get a similar outcome using NotificationConfiguration within the ASG resource.

The NotificationConfiguration could send a notice to an SNS topic which has the SQS queue subscribed. Obviously, the lifecycle will progress past pending with this approach and not wait for a complete-lifecycle-action, but at least all instance launches will be available in the queue.

HTH

edit

Another option may be to use a WaitCondition or a CreationPolicy - not sure if these would be applied before lifecycle hooks are processed.

Brett
  • 221
  • 3
  • 11