0

my website was down with at most 800 visitors. The server is running Ubuntu 18.06 and is a T3.XLARGE.

How could it have happened?

The error in sudo tail -n 20 /var/log/nginx/error.log was

connect() to unix:/run/php/php7.2-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream

How can I investigate further? which settings maybe I need to tune?

Thanks in advance

1 Answers1

0

Check the instance's CPU credits.

The T3 instance type is burstable, so it gets no fixed CPU allotment, but CPU usage credits over time. When these are used up you will not get any more CPU time.

mschuett
  • 3,066
  • 20
  • 21
  • Ok, but why everything was working again after the reboot? – Ponzio Pilato May 02 '21 at 09:27
  • Worth considering. To add a bit more detail, when CPU credits run out the server doesn't stop, it just drops to its baseline speed, which is 40% of a core. With less CPU PHP could time out. **Checking Cloudwatch is the way to verify this**. To fix it either get a larger instance or turn on T3 unlimited if more CPU is rarely needed. PHP is quite CPU intensive, caching can help a lot on some websites. A reboot might give the server enough time to accumulate more CPU credits. – Tim May 02 '21 at 09:29
  • There are many ways a webserver can fail, this was only my first thought (from experience) with T-instances. Consider this as one item on the checklist, not as the one and only solution. If if works after a reboot then you should also look for other kinds of resource exhaustion on the server itself: too many PHP processes, memory leaks, things like that. – mschuett May 02 '21 at 09:44
  • @Tim: the website is running Laravel7, that caches the page. From what I0m able to understand from CloudWatch, there's a pick of networking around 9:45, after the reboot. Nginx started give the errors around 9:25. May you help me to check carefully the monitoring of CloudWatch for learning? I mean paid fo sure – Ponzio Pilato May 02 '21 at 10:11
  • @mschuett, thanks for your answer. Is it possible to make the EC2 burstable now or I need to launch it again? About tuning I was thinking to do this: https://serverfault.com/a/1029488/579415, but I'm missing the second parameter in my sysctl net.core (net.core.netdev_max_backlog), so I don't want to risk. This is mine anyway: https://pasteimg.com/image/cHnzz – Ponzio Pilato May 02 '21 at 10:27
  • Turn on T3 unlimited as your first step https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-performance-instances-unlimited-mode-concepts.html#unlimited-mode-enabling . If you want to contact me you can do it through my profile (clink the link to my name), I will reply in 12 - 24 hours. – Tim May 02 '21 at 17:51