2

I'm noticing a very strange issue with an AWS auto-scaling group.

Instances are being reported (incorrectly) as being unhealthy. The instances are then being terminated and replaced unnecessarily. This is causing problems because it is leaving the ASG with insufficient capacity to cope with the load.

To try to identify the problem, I've temporarily suspended the "Terminate" process for the ASG.

Right at the moment I have a single instance in the group reported by the ASG as being unhealthy. Logging in to the instance and testing the health check directly proves that it is in fact healthy.

Additionally, the load balancers associated with the ASG also report all instances as healthy.

My question is. How can my ASG consider an instance to be "UNHEALTHY", if the health check type for the group is set to "ELB", and the load balancers report the instance as healthy?

Is there a way I can find out when and why the ASG flagged the instances as "Unhealthy"?

This ASG is currently associated with 2 classic ELB's, and 2 ALB Target Groups. We're in the process of migrating from ELB to ALB.

As mentioned though, both ELB's, and both Target Groups report all instances as healthy.

user1751825
  • 313
  • 5
  • 13

1 Answers1

2

I guess the problems come from having the instance part of multiple ELBs / ALB TGs. I suspect that any one of those ELBs / ALBs can then trigger the instance termination if it deems it unhealthy for whatever reason.

Change the health check type to EC2 until you're done with the migration.

The best practice is to have ASG bound to only a single load balancer.

Hope that helps :)

MLu
  • 23,798
  • 5
  • 54
  • 81
  • I also suspect this may have been the problem. I've detached the legacy ELB's and so far the issue hasn't happened again. – user1751825 Mar 05 '19 at 09:02