I've got a RH server as a VMWare instance logging GPS fixes 24x7. The transaction volume hasn't changed over the months (in fact I am pruning the transaction table more aggressively now) with only 600k to 800k records at a given time. There are no adhoc type queries being run, so I know they are returning a handful of records to clients and I know they are using an index.
This system has been live for several years and all of a sudden over the last few weeks a few days a week I/O spikes to +50%. "show processlist" which is normally empty now shows 10's to 100's of insert requests just backed up and marked as "Locked". Just as mysteriously the problem heals itself and runs for a few days with I/O wait < 1% !
The transaction mix is fixed, the same number of devices are hitting the server every day. The automated queries from the client servers make repeated regular requests. Point being everything from a client aspect is roughly the same on good days and bad. Also, this is all this server does, server a handful to cgi-perl scripts as web services against a single mysql database.
My thoughts are:
- Mysql table corruption. I've dropped and rebuilt the table a couple of times even changing it from to myisam to innodb and back. This has had no effect. Anyway I assume if it was corruption it wouldn't heal itself.
- A failing hardware component, I've got an IT staff that is removed from me I don't even control the VMware instance so I can't really debug it. Again, I can't see this healing itself over and over again.
- Other VMs squashing my disk performance. This seems to make the most sense as it just comes and goes.
Other than sar -u and iostat reporting high I/O wait and the fact that "show processlist" shows piles of pending locked inserts I can't spot anything. Machine doesn't seem to be swaping, running out of memory, and the like...but I am not Linux admin expert so who knows. Can someone give me a few ideas to help identify the problem?
SAMPLE sar -b (these numbers are insane):
12:00:01 AM tps rtps wtps bread/s bwrtn/s
...
09:40:01 AM 75.12 1.70 73.42 13.76 1124.27
09:50:01 AM 67.78 1.09 66.69 10.75 955.04
10:00:01 AM 86.89 2.62 84.27 21.58 1334.50
10:10:01 AM 75.80 1.61 74.18 13.48 1097.77
10:20:01 AM 76.28 3.52 72.76 44.01 1055.92
10:30:02 AM 768.84 697.78 71.06 81332.71 1135.01
10:40:01 AM 72.28 2.94 69.34 61.10 1005.24
10:50:01 AM 74.80 1.34 73.46 11.36 1097.11
11:00:01 AM 67.03 1.37 65.67 11.00 924.50
11:10:01 AM 71.03 1.33 69.70 14.28 1009.19
11:20:01 AM 522.77 449.29 73.48 34524.92 1118.53
11:30:01 AM 72.06 1.61 70.45 13.01 1049.04
11:40:01 AM 73.14 1.56 71.57 12.99 1057.92
11:50:01 AM 63.44 1.07 62.37 8.68 863.00
12:00:01 PM 67.55 4.11 63.45 276.15 892.03
12:10:02 PM 856.48 792.62 63.85 101373.82 961.37 (holy cow!!!)
12:20:02 PM 1371.08 1299.65 71.42 162681.77 1160.73
12:30:02 PM 851.58 779.06 72.52 107906.82 1110.43
12:40:01 PM 849.75 778.53 71.22 103911.38 1115.13
12:50:01 PM 1793.71 1731.71 62.00 226925.63 1009.08
01:00:02 PM 1203.30 1145.78 57.52 142471.68 859.83
01:10:02 PM 1706.96 1647.98 58.98 213324.29 967.99
01:20:02 PM 1651.73 1596.54 55.19 208766.31 829.68
01:30:02 PM 1836.17 1775.53 60.63 232770.33 973.96
01:40:01 PM 1732.33 1681.82 50.51 219729.38 756.62
01:50:02 PM 1882.88 1829.40 53.48 233177.18 827.11
02:00:02 PM 2022.84 1966.71 56.13 253613.95 921.93
02:10:01 PM 1729.27 1677.31 51.97 204670.95 780.73
02:20:02 PM 1524.93 1464.76 60.17 180919.64 879.26
02:30:02 PM 1850.70 1801.40 49.29 226053.30 764.31
02:40:02 PM 1675.71 1620.18 55.53 197387.47 864.52
02:50:01 PM 1990.15 1934.81 55.33 254025.79 874.29
03:00:01 PM 1953.20 1895.80 57.40 241587.35 933.04
03:10:04 PM 907.08 877.11 29.97 86874.50 514.17
03:20:01 PM 2603.69 2555.15 48.54 273595.88 820.13
03:30:02 PM 2146.49 2101.18 45.31 282196.66 721.60
03:40:04 PM 1941.32 1895.61 45.71 222215.27 763.69
03:50:03 PM 2196.20 2152.56 43.64 250260.12 699.71
TYPICAL IOSTAT:
avg-cpu: %user %nice %system %iowait %steal %idle
41.50 0.00 4.50 41.50 0.00 12.50
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 312.00 0.00 508.00 2.00 46.83 0.01 188.09 4.36 8.49 1.96 100.20
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 312.00 0.00 508.00 2.00 46.83 0.01 188.09 4.36 8.49 1.96 100.20
dm-0 0.00 0.00 826.00 2.00 46.92 0.01 116.08 7.66 9.22 1.21 100.20
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
avg-cpu: %user %nice %system %iowait %steal %idle
16.00 0.00 4.00 42.50 0.00 37.50
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 277.00 0.00 534.00 0.00 50.07 0.00 192.01 4.07 7.70 1.87 100.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 277.00 0.00 534.00 0.00 50.07 0.00 192.01 4.07 7.70 1.87 100.00
dm-0 0.00 0.00 802.00 0.00 49.86 0.00 127.33 6.27 7.85 1.25 100.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
TYPICAL SAR -U:
01:01:39 PM CPU %user %nice %system %iowait %steal %idle
01:01:40 PM all 3.50 0.00 3.50 45.50 0.00 47.50
01:01:41 PM all 9.05 0.00 3.52 46.73 0.00 40.70
01:01:42 PM all 6.97 0.00 2.49 46.27 0.00 44.28
01:01:43 PM all 22.50 0.00 3.50 46.00 0.00 28.00
01:01:44 PM all 6.03 0.00 1.51 49.25 0.00 43.22
TYPICAL TOP:
top - 13:03:05 up 4 days, 20:52, 1 user, load average: 5.04, 3.46, 2.97
Tasks: 280 total, 1 running, 278 sleeping, 0 stopped, 1 zombie
Cpu(s): 11.2%us, 4.2%sy, 0.0%ni, 38.2%id, 46.2%wa, 0.2%hi, 0.2%si, 0.0%st
Mem: 4043732k total, 4019444k used, 24288k free, 4348k buffers
Swap: 6094840k total, 84k used, 6094756k free, 3048008k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3617 mysql 15 0 403m 119m 4184 S 6.7 3.0 290:25.53 mysqld
TYPICAL IOTOP (this is crazy yes?) :
Total DISK READ: 61.55 M/s | Total DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
14375 be/4 mysql 5.41 M/s 0.00 B/s 98.17 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14543 be/4 mysql 3.35 M/s 0.00 B/s 89.86 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14615 be/4 mysql 4.57 M/s 0.00 B/s 99.99 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14613 be/4 mysql 3.79 M/s 0.00 B/s 99.99 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14306 be/4 mysql 2.53 M/s 0.00 B/s 99.99 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14283 be/4 mysql 2.88 M/s 0.00 B/s 0.00 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14310 be/4 mysql 3.76 M/s 0.00 B/s 98.96 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14298 be/4 mysql 5.16 M/s 0.00 B/s 0.00 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14532 be/4 mysql 3.31 M/s 0.00 B/s 98.97 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14573 be/4 mysql 4.90 M/s 0.00 B/s 51.72 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14389 be/4 mysql 114.30 K/s 0.00 B/s 0.00 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14575 be/4 mysql 3.44 M/s 0.00 B/s 3.62 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14583 be/4 mysql 4.71 M/s 0.00 B/s 99.99 % 99.99 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14471 be/4 mysql 4.33 M/s 0.00 B/s 99.99 % 98.97 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14302 be/4 mysql 3.48 M/s 0.00 B/s 0.19 % 98.96 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14527 be/4 mysql 2.52 M/s 0.00 B/s 3.24 % 98.28 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14344 be/4 mysql 807.69 K/s 0.00 B/s 0.29 % 98.17 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14561 be/4 mysql 45.72 K/s 0.00 B/s 99.99 % 51.72 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14565 be/4 mysql 0.00 B/s 0.00 B/s 99.99 % 3.62 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14441 be/4 mysql 0.00 B/s 0.00 B/s 99.99 % 3.24 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock
14336 be/4 mysql 0.00 B/s 0.00 B/s 99.99 % 2.86 % mysqld --basedir=/usr --datadir=/va~g --socket=/var/lib/mysql/mysql.sock