Why is NGINX timing out?

Question

I have an application with about 200k users and am running a NGINX + Gunicorn(Python) server behind an AWS EC2 loadbalancer.

I don't understand how my requests is always 4k/minute but only sometimes I get half of traffic being timeout issues. Most of the time all requests are fine, but sometimes it starts to lock up then almost all requests get timeouts.

I noticed this pattern of the # of current connections has a wave and fluctuates from 1000s to 0. Is NGINX bundling requests somehow? How can I differentiate request_time to figure out whether it's NGINX not being configured properly, or my Python server just getting extra slow endpoints being called too often.

I've attached a screenshot of one of the servers in my NGINX Amplify dashboard.

Any ideas of parts of NGINX logs or Amplify that I can investigate to determine whether this is an NGINX configuration issue, or if the hosted Python process is getting locked up? Thank you!

What does the nginx error log say during such a period. What do the system stats show concerning load? — Gerard H. Pille, May 17 '20 at 12:45

score 0 · Answer 1 · answered May 31 '20 at 12:40

Look in your CloudWatch metrics, especially for metrics about "dropped" or "failed". You can see all the details there about your loadbalancer and your EC2 instances. I don't know what are the instance types you're using but it could be that you're constantly overusing T2/T3 instances and run out of credit. I would think that some parts of your flow are being throttled for some reason and it's not neccessarly a problem on the instances themselves.

Why is NGINX timing out?

1 Answers1