Catching unresponsive / high load mysql, perhaps with monit?

Question

I'm having intermittent occurrences of mysql causing high load on the machine, making the web app unavailable. I'm using monit to monitor it, but it doesn't catch the issue, apparently because it can still connect to mysql. Here's my monit script for mysql:

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
    group database
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"
    if failed port 3306 protocol mysql then restart
    if failed unixsocket /var/run/mysqld/mysqld.sock protocol mysql then restart
    if 5 restarts within 5 cycles then timeout

When the issue occurs load on the machine is high with mysql taking almost all of the cpu. You can still "log in" to mysql using the mysql command line tool, but any selects/updates don't respond.

What should I be using to catch when this issue pops up?

Warner · Accepted Answer · 2010-06-19T16:45:03.630

Check the processlist via the MySQL client. (show full processlist;) From that point, you can isolate where the query is running and whether optimization is necessary or if it should be stopped.

From that point, you can kill $NUMBER; to terminate the problem connection as opposed to restarting MySQL.

Restarting something that is already running due to undesired operation should be a last resort and is generally a bad idea. Especially with databases, as you risk your data.

Certainly, specific situations dictate different actions. For example, if you know there is a runaway memory leak in particular software, there are no operations running, and the only way to restore the resources is to restart: then do so.

Additionally, if you are losing a lot of money every minute, a restart may be reasonable. For example, if you see no path for quick restoration and you think a restart will restore service, a restart will make sense if the risk to the data or application will be less money than you are actively losing. This principle applies but may vary slightly depending on your industry, service, and SLA.

score 1 · Answer 2 · answered Jun 19 '10 at 21:16

1

It could be locked tables or an overloaded server. Try slow query logging, as well as Warner's suggestion of processlist (tip: mytop will do this in a handy interface). Also try (regular) top to see what's competing with the database for CPU.

If you see high "load" (run queue) without processes eating (much) CPU power it could be storage related. (insufficient IOPS/throughput)

answered Jun 19 '10 at 21:16

Joris

5,939
1
15
13

I've been looking through the slow query low without finding anything I can link back to the high load. I don't know if it's there and I'm not seeing it or if the cause of the issue is not recorded in the slow query log. I'll try setting up an automated `show full processlist` dump in cases of load and see what pops up. – Parand Jun 20 '10 at 07:12
Make sure your slow query treshhold is not too high and run one of the analyzer tools on it to get aggregated statistics. How long do the unresponsive periods last? If it's minutes at the time I'd recommend installing munin as well, for after-the-fact quick glance stats of what's going on in your system. – Joris Jun 20 '10 at 16:51

Catching unresponsive / high load mysql, perhaps with monit?

2 Answers2