I'll start with a small description of how my application works..
If a user on my website adds a task, the task is broken down into multiple sub-tasks, the number can vary from 1 to 10 tasks. These 10 tasks are added to the SQS queue. I have a Ubuntu EC2 instance running node.js and docker.
Node.js is configured to listen to the queue, and once it receives the sub-task message it spawns a docker container which in-turn performs the sub-task. Once the sub-task is completed the container is destroyed.
I have a c4.2xlarge EC2 instance that performs the above process flawlessly for 1 task (10 sub-tasks). However the issue arises when multiple tasks are added at the same time. Say I do a test of 10 tasks, which are broken down into 100 sub-tasks, the server experiences severe load during launching of the containers.
How do I go about scaling such an environment?
I have been thinking of reserving a pool of stopped EC2 instances, yes "Stopped" because the delay to spawn a new instance is very high and I would like to consume the sub-tasks in the queue as soon as possible without having to bear the cost of running a server 24/7.
Is writing a load-balancer in node.js based on resources/number of messages in the queue the best way to go?