Auto-scaling docker containers based on EC2 instance load and SQS

Question

I'll start with a small description of how my application works..

If a user on my website adds a task, the task is broken down into multiple sub-tasks, the number can vary from 1 to 10 tasks. These 10 tasks are added to the SQS queue. I have a Ubuntu EC2 instance running node.js and docker.

Node.js is configured to listen to the queue, and once it receives the sub-task message it spawns a docker container which in-turn performs the sub-task. Once the sub-task is completed the container is destroyed.

I have a c4.2xlarge EC2 instance that performs the above process flawlessly for 1 task (10 sub-tasks). However the issue arises when multiple tasks are added at the same time. Say I do a test of 10 tasks, which are broken down into 100 sub-tasks, the server experiences severe load during launching of the containers.

How do I go about scaling such an environment?

I have been thinking of reserving a pool of stopped EC2 instances, yes "Stopped" because the delay to spawn a new instance is very high and I would like to consume the sub-tasks in the queue as soon as possible without having to bear the cost of running a server 24/7.

Is writing a load-balancer in node.js based on resources/number of messages in the queue the best way to go?

I think you are asking the wrong question. It isn't that you are load balancing between task processors (or scaling them up/down) that is the problem, but that you are building one up and tearing it down for each subtask generating a massive overhead. — JamesRyan, Jul 24 '15 at 16:13

score 2 · Answer 1 · answered Jul 24 '15 at 16:36

You don't need a load-balancer. You need to adjust how your application deals with items in the queue.

Benchmark your application on various instance sizes to determine the optimal/maximum number of tasks it can handle simultaneously. For the purpose of illustration, let's say that number is 20.
Change your application to pull a maximum of 20 items from the queue, never more. When one of your 20 sub-tasks is complete, then it can go back to the queue for more.

At this point, your single worker server should not "overload". It will simply take a while to work through a large queue of items.

You need to have such a limit to prevent your "severe load during launching of the containers".

Use CloudWatch alerts to add more worker instances as your queue gets too big and to terminate instances as the queue gets smaller. Each instance will process 20 at a time.

Once you have this, you can make some optimizations:

Do you need to spawn new containers every time, or can you re-use them? This would save some CPU cycles when processing a sub-task.
You may be able to use EC2 instances in a stopped state rather than spawning fresh instances. But you'd have to do custom handling of your CloudWatch alert for this, but it could be done. You'd want to have a listener on SNS waiting for the alerts at which point it can start and stop EC2 instances as needed.
Benchmark your application using various instance sizes. It's possible that using two c4.xlarge is better than using one c4.2xlarge. Experiment and try different combinations.

score 0 · Answer 2 · answered Nov 30 '18 at 20:07

This could now be done with Serverless technology, AWS Lambda functions, and AWS SNS or SQS.

https://aws.amazon.com/lambda/

For example, the Task could trigger SNS which could send it to Lambda, where your function will auto-scale based on the demand.

Auto-scaling docker containers based on EC2 instance load and SQS

2 Answers2