2

I have a production server that's usually very stable, and has been for a very long time.

Last night, it suddenly started showing very high loads (150+). Deactivating Apache will decrease the load immediately.

enter image description here

Here's the output of top, shortly after restarting Apache, already the load has climbed back to 22.68.

Server logs and the output of mod_status don't show anything particlarly interesting, other than that requests start backing up because of the high load.

It doesn't seem like there's any unusual server load at all in terms of requests to explain the sudden problem.

Any ideas?

UPDATE

Here is a screenshot of mysql connections from show processlist; enter image description here

fred2
  • 97
  • 9

2 Answers2

1
  1. Decrease the number of children created by Apache. There is no benefit in having dozens of web pages stumbling over each other. Apache is quite happy to delay starting a new page.
  2. If the "high load average" is caused by MySQL, then it is very likely to be a missing index or a poorly formulated SELECT. Find that naughty query and open a question in stackoverflow.com to discuss how to improve it.

As for the SHOW PROCESSLIST you added, my comment is Zzzzzzzzzzz. MySQL is not doing 'anything' -- notice that essentially all are in 'Sleep' mode. 'Sleep' is the state of a connection (possibly from a connection pool) that is doing nothing except waiting for the next SQL statement to be sent to it.

Looking at the top... It looks like 2 cores's worth of CPU power is being used by mysqld, and 6 core's worth by Apache. What client are you using? Of PHP, you have some busy PHP code that needs optimizing. Start by looking at each loop, especially nested loops.

As for "sudden problem", ... Situations like this (in my experience) build up for a while, then "suddenly" 'go through the roof'. They will usually unwind eventually. Sure, you can kill Apache or mysqld or reboot or stop clients from coming in. Meanwhile, your 'users' are getting a horrible "user experience".

Rick James
  • 2,058
  • 5
  • 11
  • If serving a page requires disk IO, having much more apache children than cpu cores are imho useful. – peterh Jan 14 '19 at 19:34
  • @peterh - That is true in some cases. Still, beyond a few dozen clients, MySQL stumbles over itself, leading to terrible latency and perhaps even declining throughput. – Rick James Jan 14 '19 at 19:37
  • The problem I'm facing seems to be very resistant to any improvement by tweaking configuration. While there are probably SQL queries that could be improved, there is nothing new which is clogging MySQL - the slowest queries are on a website which is basically unchanged over the last 5 years, and it never caused problems before. I've played around with max connections, to no avail. – fred2 Jan 14 '19 at 19:42
  • 1
    @fred2 - How low did you try `max_connections`? What I want you to change is the number of connections to Apache. – Rick James Jan 14 '19 at 20:25
  • 1
    @fred2 - I added some more paragraphs. – Rick James Jan 14 '19 at 20:32
  • Thanks @RickJames. I've just tried reducing the max_connections down to 40 and will continue decreasing it until at least the CPU load is reasonable. – fred2 Jan 14 '19 at 21:12
  • @RickJames. MySQL max_connections needs to be as big as MaxRequestWorkers in Apache2.4 otherwise you end up with a 'too many connections error'. I've reduced MaxRequestWorkers to 15, which is crazy low ... the default is 150. Right now my load average is down from an insane 160 to 8.5, but the web server response time is still terrible. I don't know if this is relevant, but even with only a few mysqld processes working, each one is pushing the CPU to 100% (actually as high as 160%, whatever than means). – fred2 Jan 14 '19 at 21:25
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/88263/discussion-between-fred2-and-rick-james). – fred2 Jan 14 '19 at 21:28
0

For future reference, the problem was a failing RAID controller. Two identical servers of identical age saw their RAID controllers fail within a week of each other.

Not only did the RAID controller fail, but the system notification for telling me that the RAID controller was failing, failed.

fred2
  • 97
  • 9