5

We have an Autoscaling Group in AWS that we manually scale up and down at the time being. Our ASG is currently attached to one Target Group, which has all of our prod servers in it.

Before using the ASG, we would add and remove servers to the Target Groups manually from the Target Group console, and the nodes would deregister after draining properly. We have a very stateful application so the deregistration delay is really important for us.

Yesterday when scaling down through the ASG interface (specifically by asking for a lower number of desired instances), all the connections dropped instantly, which dropped hundreds of connections instantly instead of honoring the five minute draining policy of the target group.

How do I make my ASG honor my draining policy?

I have tried this: AWS ASG With Application LB and Connection Draining but it does not set the connections to "draining", but just "terminating: waiting", and my health checks are healthy so I don't think it is actually stopping new connections/draining.

  • 1
    Hi, if the response below answered your question please upvote and accept it. That's the ServerFault way of saying *Thanks* for the time someone took to help you :) – MLu Feb 19 '19 at 23:06

2 Answers2

4

As described in the linked answer you will need an ASG Lifecycle hook to start with.

Whenever the Terminating event occurs fire up a Lambda function and in that Lambda deregister your instance from the Target Group using deregister-targets. That should move it to Draining phase, then wait however long you need to wait and once the instance is drained continue with the termination.

Hope that helps :)

MLu
  • 23,798
  • 5
  • 54
  • 81
2

The above shouldn't be required unless you need to do additional tasks other than just wait for the deregistration delay set on the target group, or the instance needs to be up for some time after the deregistration delay ends. Make sure to check what the deregistration delay is on the target group

When the ASG scales in (for example, from lowering the desired capacity, as you're doing). It should make a deregister call to any Classic Load Balancers or Target Groups associated with it, and then wait for those deregister calls to finish before it terminates the instance.

On the TargetGroup the status should be listed as 'draining' and on the activity history of the ASG it will temporarily list the state of that event as 'Waiting for ELB connection draining'

If you have a terminating lifecycle hook, the instance will be deregisterd from the target group before the lifecycle hook starts. Since you had the instance in the 'terminating:wait' state, it sounds like your instance was in the middle of a terminating lifecycle hook and should have already waited for the deregistration delay.

As a side note, Classic Load Balancers use 'connection draining' which is different than Application Load Balancers 'deregistration delay'. Connection draining will end as soon as there are no more inflight connections to that instance, or at the configured timeout, whichever is shorter.

Shahad
  • 326
  • 1
  • 6