figuring out why a docker container becomes unresponsive

Question

I'm using AWS Elastic Container Services to start and stop Docker containers as demand dictates. The problem is that occasionally, in the middle of the day, a subset of employees just lose the ability to access this containerized website. Killing the Docker containers, one by one, forcing new ones to be spun up, seems to resolve the issue, however.

What I don't understand is what's causing the Docker container to be unresponsive. If the Docker container just died out of the blue then a new one would be created to accommodate the demand but in this case the Docker container isn't dying and I'm not seeing errors on AWS either. But maybe I'm just not looking in the right place?

Are your containers are behind an ALB? Are sticky sessions defined on the ALB? What is your ALB health check to the containers? Is there any scaling before the problem happens, particularly scaling in? Is the container actually unresponsive or is the request not getting to the container? — Tim, Mar 09 '22 at 21:42
@Tim - the containers are indeed behind an ALB. Stickiness is disabled. I'm not sure if CloudWatch is an ALB health check but nothing looks suspicious in the CloudWatch graphs for the affected period. — neubert, Mar 09 '22 at 23:35
"_Is there any scaling before the problem happens, particularly scaling in?_" Not that I've noticed. Minimum tasks is 3 and maximum tasks is 10. The desired count right now is 3 as is the running count. idk what the point of the desired count is. I mean, I desire the minimum number of containers to minimize my monthly spend. That's the whole point of containers anyway, isn't it? — neubert, Mar 09 '22 at 23:57
"_Is the container actually unresponsive or is the request not getting to the container?_" It is not clear to me how I would make that determination? Each container has it's public and private IP address - maybe pinging each one on one of those IP addresses would be sufficient if I'm connected to an OpenVPN instance that's part of the same network? — neubert, Mar 09 '22 at 23:58
VPC flow logs might help you understand if the request is getting to the container, or maybe Cloudwatch Logs if it's integrated. I suggest you look into ALB health checks to make sure the ALB knows for sure if your container is available to service requests. — Tim, Mar 10 '22 at 00:10
Another way to check container health is to create an EC2 instance in the same subnet and make a request direct to the container, and via the ALB. — Tim, Mar 10 '22 at 00:35

figuring out why a docker container becomes unresponsive

0 Answers0