How are load balancers, health checks, and autoscalers supposed to handle surges?

Question

I have a service which looks like: Internet -> Load Balancer -> Instances

What seems to be happening in my service is the following scenario:

A large surge of user traffic comes in, more than the current instances can handle.
The load balancer does health checks, and sees that they are failing, because the instances are overloaded.
The load balancer triggers termination of the failing instances and replaces them with new ones. It detects the high CPU usage across the cluster and also creates more instances than there were previously.
Now there are zero instances that can handle any requests and the service is completely down until the new instances are ready.

Obviously this is a bad state of affairs. What should be done instead?

The tools in question are AWS ECS Fargate and AWS Application Load Balancer. The configuration options seem to consist mostly of tuning various values, without changing behavior.

The health check is very simple, it loads a static file. It basically measures whether the web server in each instance is alive or dead.

The auto scaler is set up to measure CPU load and scale up within 30 seconds.

I don't see how to improve the architecture using this set of tools. Is there a better way?

I think on a conceptual level, what I am looking for is called "Quality of Service" where a single user can't disrupt the availability for other users. It is kind of similar to defending a DDOS except the traffic is legitimate.

Step 3 should include the auto scaling component mentioned in the title of your question and not only should failed nodes get replaced, **more** should be added — HBruijn, Dec 21 '18 at 06:16
@HBruijn Agree, that is right, I forgot to mention that it does scale up based on CPU usage — Kevin Baragona, Dec 21 '18 at 06:31
Can you please edit your question to show when / how it scales up. What is your health check checking? Is it loading a static page, loading a page that hits a database or does something intensive? Auto scaling and load balancers take time to react, you can't expect immediate scaling. It might be that you need to run more / larger instances to cope with the surge for 5 - 15 minutes until other instances can be started. If the spike is really huge it can take an hour to allocate more load balancer resources. — Tim, Dec 21 '18 at 07:44
@Tim updated with notes about the health check. I realize that adding more resources will allow handling bigger surges but there will always be surges too big to handle. I would like to handle such situations gracefully without degrading the service for other users. Maybe API Gateway is a possible solution? — Kevin Baragona, Dec 21 '18 at 07:52
I'm not sure how API Gateway will help here. I hope someone else has some ideas for you, but the only thing I can see is adding more resources. You could look at [EC2 predictive scaling](https://aws.amazon.com/blogs/aws/new-predictive-scaling-for-ec2-powered-by-machine-learning/), time based scaling if it's to a pattern, caching can help reduce load in some cases, maybe a CDN to shed load. — Tim, Dec 21 '18 at 08:24

How are load balancers, health checks, and autoscalers supposed to handle surges?

0 Answers0