0

Description of problem

I have a fairly large MySQL installation. There are at least 3 separate servers which run the innoDB storage engine. Once every minute, at the same time each minute, for about 3-4 seconds, each of my innodb machines suddenly stop performing well.

When I do a SHOW PROCESSLIST on each server at normal times, I see about 10-15 connections, doing their thing as normal:

+--------+------------------+--------------------------+------+---------+------+-------+------------------+
| Id     | User             | Host                     | db   | Command | Time | State | Info             |
+--------+------------------+--------------------------+------+---------+------+-------+------------------+
|  23457 | root             | localhost                | NULL | Query   |    0 | NULL  | show processlist | 
| 180042 | **********       | web2.***.com:49867       | ***  | Sleep   |    1 |       | NULL             | 
| 180129 | **********       | web1.***.com:54302       | ***  | Sleep   |    0 |       | NULL             | 
| 180155 | **********       | web2.***.com:50225       | ***  | Sleep   |    0 |       | NULL             | 
| 180163 | **********       | web1.***.com:54425       | ***  | Sleep   |    0 |       | NULL             | 
| 180172 | **********       | web1.***.com:54507       | ***  | Sleep   |    0 |       | NULL             | 
| 180181 | **********       | web4.***.com:34893       | ***  | Sleep   |    0 |       | NULL             | 
+--------+------------+--------------------------+------+---------+------+-------+------------------------+

Then all of a sudden, almost exactly in sync on each machine, at the same time each minute (meaning on :47 seconds after the minute each minute on each machine), processes will pile up hanging in the "update" state:

| 192938 |  **********       | web3.***.com:44248              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (101670151,"{\"inbox\":{\"new\":12,\"spam_check\":1289 | 
| 192939 |  **********       | web4.***.com:50264              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (17103785,"{\"inbox\":{\"new\":1,\"spam_check\":0,\"di | 
| 192940 |  **********       | web3.***.com:44258              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (2245293,"{\"inbox\":{\"new\":14,\"spam_check\":128933 | 
| 192941 |  **********       | web3.***.com:44268              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (105330063,"{\"inbox\":{\"new\":4,\"spam_check\":0,\"d | 
... 100-200 more just like this...
| 192941 |  **********       | web3.***.com:44268              | ***  | Query   |    3 | update | INSERT INTO user_stats (***_uid, data) VALUES (105330063,"{\"inbox\":{\"new\":4,\"spam_check\":0,\"d | 

Upon closer inspection, it seems like it's high CPU usage at that moment (although I suppose high CPU could be caused by high disk I/O), because when it's in the midst of this, and I run something simple like SELECT NOW(), even that will take like 4 seconds to complete.

Here's what I know:

  1. It isn't a rogue unoptimized query. Happens on different machines that not only run different queries, but also different tables.
  2. It only happen on machines that do writing to innoDB tables. This does not happen on machines that only read innoDB, or machines that only write or read from MyISAM.

Questions

Is there a process that runs every minute on innoDB that takes up lots of CPU or disk I/O? Is this normal? I know that it could be a million different things, but I'm looking for known problems or solutions. Is there any more info I can provide to help solve this issue?

additional info

OS:

uname -a
Linux db04.****.com 2.6.18-194.17.4.el5 #1 SMP Wed Oct 20 13:03:08 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

File system:

/dev/sda4     ext3   785711096  80539996 665259216  11% /data

Raid config:

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL


Adapter 0 -- Virtual Drive Information:
Virtual Disk: 0 (target id: 0)
Name:Virtual Disk 0
RAID Level: Primary-1, Secondary-3, RAID Level Qualifier-0
Size:856704MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:2
Span Depth:3
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default

MySQL Version

mysql> select version();
+---------------------------+
| version()                 |
+---------------------------+
| 5.0.80-enterprise-gpl-log | 
+---------------------------+
1 row in set (0.01 sec)
Mike Sherov
  • 113
  • 1
  • 7

1 Answers1

1

Are you sure you don't have a cron job running every minute?

What is your value for innodb_flush_method?

Since you have a RAID Controller with BBU-backed write cache (and data/logs not stored on a SAN) the recommended setting is: O_DIRECT

You might also want to use a tool like innotop to analyze your load better. Especially Pending I/O.

HTH

Edit: What's your value for innodb_buffer_pool_size as well?

HTTP500
  • 4,827
  • 4
  • 22
  • 31
  • Jason, thanks for the response. Definitely not cron job related. innodb_flush_method is the default, which is not O_DIRECT. I heard this helps with load, but I didn't know if my specific problem was a symptom that this setting change could remedy. My innodb_buffer_pool_size is `| innodb_buffer_pool_size | 4294967296 | ` or 4 gigs. That's plenty. I'm going to look into innotop. Thanks. – Mike Sherov Nov 10 '10 at 18:17
  • One other thing you might look at is innodb_log_file_size. If that is the default (5MB) it is most likely way too small. MySQL docs would say that you could set this to 1/2 the size of your buffer pool (or 2GB) but it also affects recovery time. You might want to try 256MB or 128MB. See here: http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_log_file_size – HTTP500 Nov 10 '10 at 20:05