I recently inherited a web server setup from another developer. Its basically the following:
2 web servers running apache 2 2 load balancers running nginx 2 database servers running MySQL
Every week or so the apache web servers become unresponsive to requests and the load balancer ends up returning 504 gateway timeout. I logged in to the web server and checked uptime it returned:
18:40:49 up 5 days, 20:15, 1 user, load average: 122.37, 119.80, 107.57
which is extremely high compared to the number of processes available for the instance which is 8.
In order to get things back online as fast as possible I ended up restarting the web servers and everything went back to normal: 18:54:19 up 5 min, 1 user, load average: 0.11, 0.22, 0.10
I am not asking for definite answers as I should be looking further into the source of the problem but I would like some hints and suggestions regarding this issue:
- Why do you think this might be happening ?
- What are ways in which I can look further into this issue to be able to identify the source of the problem ? I need some pointers on where and what to look for.
Thanks for the help.