MySQL on VM, High IO Wait but low TPS

Question

We are trying to figure out a strange situation on our MySQL installation on VM. We are on Mysql 5.7 and RHEL 7 with XFS.

We are observing when we run a few select queries the IOWait races up to 40-50% while we see the Disk Read speeds dont cross 25-30 MBps.

We checked at OS level but we are easily getting 500-600 MBps when we try to do a file copy and other Disk Read and Write speed tests.(So we assume its not a disk IO bottleneck)

IOPS shows over 20,000 and averaging at 9-10,000.

We are trying to figure out what might be causing disk IO Wait while we have sufficiently fast Disks and enough CPU to spare. Few of the Key Variables of MySQL are listed below:

SHOW GLOBAL VARIABLES LIKE innodb_io%
innodb_io_capacity 200
innodb_io_capacity_max 2000

SHOW GLOBAL VARIABLES LIKE '%thread%'
innodb_purge_threads 4
innodb_read_io_threads 4
innodb_thread_concurrency 0
innodb_thread_sleep_delay 10000
innodb_write_io_threads 4
max_delayed_threads 20
max_insert_delayed_threads 20
myisam_repair_threads 1
performance_schema_max_thread_classes 50
performance_schema_max_thread_instances -1
thread_cache_size 100
thread_handling one-thread-per-connection
thread_stack 262144

SHOW GLOBAL STATUS LIKE '%thread%'
Delayed_insert_threads 0
Performance_schema_thread_classes_lost 0
Performance_schema_thread_instances_lost 0
Slow_launch_threads 2
Threads_cached 61
Threads_connected 561
Threads_created 44399
Threads_running 2

Select @@Max_connections : 1200

1. Slow query explain is as below:

EXPLAIN 
SELECT  COUNT(msgid)
    FROM  `2018-10-30`
    WHERE  priority=1
      AND  (message LIKE '%596f7572204f6e652054696d652050494e20%'
              OR  message LIKE '%4f545020666f7220%'
              OR  message LIKE '%4f545020616e642072656620%'
              OR  message LIKE '%4f545020746f20%'
           );

id 1
select_type SIMPLE

EXPLAIN 
SELECT  COUNT(msgid)
    FROM  2018-10-30
    WHERE  priority=1
      AND  (message LIKE '%596f7572204f6e652054696d652050494e20%'
              OR  message LIKE '%4f545020666f7220%'
              OR  message LIKE '%4f545020616e642072656620%'
              OR  message LIKE '%4f545020746f20%'
           );

table 30-10-2018
partitions \N
type ALL
possible_keys \N
key \N
key_len \N
ref \N
rows 28431345
filtered 3.76
Extra Using where

2. Slow query explain is as below:

EXPLAIN 
SELECT  COUNT(msgid)
    FROM  `2018-10-30`
    WHERE  priority=1
      AND  ISDFlag=0
      AND  msgsubmitid IS NOT NULL
      AND  (message LIKE '%596f7572204f6e652054696d652050494e20%'
              OR  message LIKE '%4f545020666f7220%'
              OR  message LIKE '%4f545020616e642072656620%'
              OR  message LIKE '%4f545020746f20%'
           );

id 1
select_type SIMPLE
table 30-10-2018
partitions \N
type ALL
possible_keys index_msgsubmitid
key \N
key_len \N
ref \N
rows 28431345
filtered 0.19
Extra Using where

Create Table
"CREATE TABLE 2018-10-30 (
MsgId bigint(20) NOT NULL,
UserId int(11) NOT NULL,
Status bit(1) DEFAULT NULL,
Priority int(11) NOT NULL,
MsgStatus int(11) DEFAULT '1111',
DestinationNumber varchar(50) NOT NULL,
OrginatorName varchar(11) DEFAULT NULL,
OrginatorNumber varchar(20) DEFAULT NULL,
MsgSubmitID varchar(100) DEFAULT NULL,
MsgStatusMsg varchar(1000) DEFAULT NULL,
MsgDeliveryDateTime varchar(50) DEFAULT NULL,
Message varchar(500) NOT NULL,
IPaddress varchar(15) NOT NULL,
TransDate datetime NOT NULL,
SubmitDateTime datetime DEFAULT NULL,
SMSType int(11) DEFAULT NULL,
DateTimeToSend datetime DEFAULT NULL,
Subject varchar(100) DEFAULT NULL,
ISDFlag tinyint(4) DEFAULT NULL,
GatewayID int(11) DEFAULT NULL,
SmscSubmitDateTime datetime DEFAULT NULL,
ClientMsgId varchar(100) DEFAULT NULL,
Source int(10) DEFAULT '0',
CreatedDateTime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
UpdatedDateTime datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
PRIMARY KEY (MsgId),
KEY index_msgsubmitid (MsgSubmitID),
KEY index_gatewayid (GatewayID),
KEY index_TransDate (TransDate),
KEY index_dstn_no (DestinationNumber),
KEY index_UserId (UserId),
KEY index_MsgStatus (MsgStatus)
) ENGINE=InnoDB DEFAULT CHARSET=latin1"

Table Index

iostat During Load

[ ~]$ iostat -xm 5 3

Linux 3.10.0-957.5.1.el7.x86_64 (...)         04/09/2019      _x86_64_        (24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.13    0.00    1.54    1.56    0.00   92.77

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00   31.50   31.50    0.00  31.50   0.00
sde               0.00     0.03   45.51   54.98     0.73     2.30    61.65     0.17    1.64    0.95    2.22   0.49   4.89
sdi               0.00     0.00    0.46    0.18     0.23     0.09  1016.01     0.02   31.79    6.59   95.41   1.15   0.07
sdh               0.00     1.77  112.94   27.89     1.77     0.69    35.77     0.13    0.96    0.79    1.65   0.48   6.83
sdc               0.00     0.18  114.18  144.24     2.11     7.36    75.07     0.33    1.26    0.95    1.50   0.45  11.62
sda               0.00     0.01    0.01    0.04     0.00     0.00   245.47     0.00    9.96    4.75   11.82   0.84   0.00
sdj               0.00     0.01   65.86    4.17     1.04     0.10    33.41     0.06    0.87    0.80    1.92   0.54   3.77
sdd               0.57     0.91    0.12    0.18     0.00     0.00    64.37     0.00    4.88    1.43    7.12   1.90   0.06
sdb               0.00     0.05   12.34    7.21     0.31     0.37    71.69     0.03    1.30    0.88    2.03   0.57   1.11
sdf               0.00     0.00   33.24    9.79     0.52     0.33    40.69     0.04    1.01    0.82    1.67   0.53   2.27
sdg               0.00     0.00   71.83    6.64     1.12     0.26    35.98     0.07    0.84    0.72    2.13   0.51   3.97
dm-0              0.00     0.00    2.73    1.44     0.15     0.01    76.44     0.00    1.17    1.31    0.90   0.60   0.25
dm-1              0.00     0.00    0.68    1.09     0.00     0.00     8.01     0.02   10.23    1.22   15.91   0.31   0.06
dm-2              0.00     0.00  453.14  249.19     7.43    11.24    54.44     0.81    1.15    0.84    1.72   0.30  21.30
dm-3              0.00     0.00    0.00    0.04     0.00     0.00   105.64     0.00   10.38    2.45   10.40   0.56   0.00
dm-4              0.00     0.00    0.49    0.50     0.24     0.23   981.47     0.03   25.41    6.51   44.16   1.00   0.10
dm-5              0.00     0.00    0.01    6.02     0.00     0.03     9.42     0.01    1.07    4.12    1.07   0.46   0.28

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.40    0.00    1.44   15.94    0.00   78.22

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00  941.00    4.40    14.70     1.94    36.05     0.90    0.96    0.92    9.68   0.81  76.62
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     2.00    0.00    3.80     0.00     0.03    13.47     0.00    0.58    0.00    0.58   0.32   0.12
sdc               0.00     0.40  863.40  247.00    13.49    11.70    46.46     1.12    1.01    0.78    1.80   0.63  69.84
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.20     0.00     0.00    16.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   62.40     0.00     0.97    31.80     0.08    1.25    0.00    1.25   0.14   0.88
sdf               0.00     0.00 1818.40    0.00    28.41     0.00    32.00     2.10    1.15    1.15    0.00   0.54  98.94
sdg               0.00     0.00  131.40    0.20     2.05     0.00    32.00     0.18    1.34    1.34    1.00   1.33  17.50
dm-0              0.00     0.00    0.00    0.60     0.00     0.00    11.33     0.00    0.67    0.00    0.67   0.33   0.02
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00 3754.60  314.00    58.67    14.61    36.88     4.38    1.08    1.02    1.79   0.25  99.90
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    5.80     0.00     0.03     8.83     0.00    0.38    0.00    0.38   0.24   0.14

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.30    0.00    1.36   15.71    0.00   78.62

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
fd0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00 1004.60    3.60    15.70     1.70    35.34     0.89    0.89    0.86    9.11   0.78  78.80
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh               0.00     2.00    0.00    3.80     0.00     0.02    13.05     0.00    0.74    0.00    0.74   0.63   0.24
sdc               0.00     0.00  883.40  135.60    13.80     7.78    43.37     0.87    0.86    0.78    1.35   0.69  70.42
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00   28.00     0.00     0.44    32.00     0.03    1.15    0.00    1.15   0.19   0.54
sdf               0.00     0.00 1762.60    0.00    27.54     0.00    32.00     2.09    1.19    1.19    0.00   0.56  98.88
sdg               0.00     0.00  126.00    0.00     1.97     0.00    32.00     0.18    1.39    1.39    0.00   1.39  17.54
dm-0              0.00     0.00    0.00    0.20     0.00     0.00    32.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00 3776.20  167.00    59.00     9.91    35.79     4.07    1.03    1.01    1.49   0.25  99.96
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    5.80     0.00     0.02     8.55     0.00    0.83    0.00    0.83   0.41   0.24

Mount Output

$ mount

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=49397072k,nr_inodes=12349268,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/rhel-root on / type xfs (rw,relatime,attr2,inode64,noquota)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
/dev/mapper/rhel-var on /var type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/rhel-home on /home type xfs (rw,nosuid,nodev,noexec,relatime,attr2,inode64,noquota)
/dev/mapper/rhel-tmp on /tmp type xfs (rw,relatime,attr2,inode64,noquota)
/dev/mapper/rhel-lv_dam on /dam_agent type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
tmpfs on /run/user/42 type tmpfs (rw,nosuid,nodev,relatime,size=9882784k,mode=700,uid=42,gid=42)
tmpfs on /run/user/987 type tmpfs (rw,nosuid,nodev,relatime,size=9882784k,mode=700,uid=987,gid=981)
tmpfs on /run/user/1012 type tmpfs (rw,nosuid,nodev,relatime,size=9882784k,mode=700,uid=1012,gid=1012)
tmpfs on /run/user/1005 type tmpfs (rw,nosuid,nodev,relatime,size=9882784k,mode=700,uid=1005,gid=1005)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=51,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=19059084)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=9882784k,mode=700)

Server Stats
Cpus : 24
Cores: 24
Sockets: 12
RAM: 96GB

ioping Stats
While investigating we found the following ioping stats, which show some pings are as high as 5ms, still investigating if this could have any possible impact on disk IO bandwidth:

# ioping /var/
4 KiB <<< /var/ (xfs /dev/dm-2): request=1 time=6.12 ms (warmup)
4 KiB <<< /var/ (xfs /dev/dm-2): request=2 time=569.9 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=3 time=618.6 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=4 time=505.7 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=5 time=534.8 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=6 time=744.6 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=7 time=1.10 ms (slow)
4 KiB <<< /var/ (xfs /dev/dm-2): request=8 time=447.6 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=9 time=578.0 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=10 time=1.11 ms (slow)
4 KiB <<< /var/ (xfs /dev/dm-2): request=11 time=586.4 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=12 time=449.4 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=13 time=402.0 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=14 time=650.4 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=15 time=497.9 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=16 time=4.78 ms (slow)
4 KiB <<< /var/ (xfs /dev/dm-2): request=17 time=534.5 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=18 time=8.27 ms (slow)
4 KiB <<< /var/ (xfs /dev/dm-2): request=19 time=876.8 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=20 time=3.99 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=21 time=1.04 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=22 time=1.20 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=23 time=980.6 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=24 time=2.26 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=25 time=794.6 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=26 time=963.0 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=27 time=1.91 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=28 time=1.04 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=29 time=643.9 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=30 time=1.40 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=31 time=837.2 us
4 KiB <<< /var/ (xfs /dev/dm-2): request=32 time=1.54 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=33 time=5.13 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=34 time=381.3 us (fast)
4 KiB <<< /var/ (xfs /dev/dm-2): request=35 time=1.03 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=36 time=1.27 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=37 time=1.99 ms
4 KiB <<< /var/ (xfs /dev/dm-2): request=38 time=827.9 us
^C
--- /var/ (xfs /dev/dm-2) ioping statistics ---
37 requests completed in 52.5 ms, 148 KiB read, 705 iops, 2.75 MiB/s
generated 38 requests in 37.5 s, 152 KiB, 1 iops, 4.06 KiB/s
min/avg/max/mdev = 381.3 us / 1.42 ms / 8.27 ms / 1.59 ms

MySQLTuner Output is on this link: https://pastebin.com/H4pxRttg
MySQL my.cnf: https://pastebin.com/CEcjvBRS
Show Global Status: https://pastebin.com/c54xPmtT
Show Global Variables: https://pastebin.com/9edrGmaL
show process list: https://pastebin.com/gNwF0KpG

Top

Ulimit

iostat

iostat2

df

Any insight on where to look would be greatly appreciated.

You need to isolate the issue. Examine query plans with EXPLAIN. Run mysqlbench. Benchmark the storage subsystem with iozone. — Mark Wagner, Apr 04 '19 at 16:05
Please post TEXT results of SHOW GLOBAL VARIABLES LIKE 'innodb_io%'; and SHOW GLOBAL VARIABLES LIKE %thread%; and SHOW GLOBAL STATUS LIKE %thread%; and EXPLAIN SELECT ...... (for one of your slow queries), when time permits. — Wilson Hauck, Apr 06 '19 at 11:13
Thank you all for your comments, As requested by @WilsonHauck have updated the question above with stats requested. Please do let me know if you need any more info. — Kiran Thummalapalli, Apr 08 '19 at 10:03
@MarkWagner we did bench mark the storage and mysql, but speeds seem to be fine, will try to gather stats and update my question with the same too shortly. We found when we copy a file on the linux file system we see over 600 Mbps IO bandwidth, but when we run few queries MySQL goes into IOWait but IO Bandwidth is never above 25-30Mbps. So we suspect there is some throttling at MySQL end in using full capacity of File System IO — Kiran Thummalapalli, Apr 08 '19 at 10:07
@KiranThummalapalli To go with your EXPLAIN results, we need the A) query submitted for your table 30-10-2018 data request, B) text results SHOW CREATE TABLE 30-10-2018; C) text results of SHOW INDEX FROM 30-10-2018; to advise on adding an INDEX because you are using FULL table scan of ~ 28 million rows. How much RAM is on this server, how many CORES? iostat -xm 5 3 from your OS may reveal your CPU/CORE count. Please confirm you have either SSD or NVME for data storage. Are you using PHP? — Wilson Hauck, Apr 08 '19 at 12:12
@KiranThummalapalli What is result of SELECT @@max_connections? Thanks — Wilson Hauck, Apr 08 '19 at 12:38
@WilsonHauck max_connections is 1200 added that above also updated a couple of queries and their explain also added the Table structure and index as requested with cpu/ram other info you requested will add shortly. we dont use PHP this is primarily just a exclusive DB server running just MySQL — Kiran Thummalapalli, Apr 09 '19 at 03:32
@KiranThummalapalli In your query when a WHERE starts with % (wildcard) - scan all rows is required. Any chance your query would deliver the same rows if you did NOT use leading % for any of the WHERE values? If the query does deliver the same rows, you will then need to ADD INDEX for your message column for high performance and avoid the 28 million row scan. Alternative - research Full Text Search capability. — Wilson Hauck, Apr 09 '19 at 12:25
@WilsonHauck Unfortunately it wont deliver the same rows if we dont use leading %. Also this query is not frequently run so the overhead of having a Full Text search and related index outweighs the benefits in our case. More than only trying to reduce the time for this specific query, we are trying to identify the root cause on what is causing MySQL to not use the full IO bandwidth available and causing it to get into IO wait at just 25-30 Mbps. While any other direct OS File copies etc give us more than 600Mbps etc. — Kiran Thummalapalli, Apr 10 '19 at 01:01
@ewwhite I have updated my question with the output of `mount`and also we verified the `tuned-adm` is set to `virtual-guest` we are investigating why this was set in the first place, also would appreciate if you can let us know if this has any significant impact in the scenario we are are in presently? — Kiran Thummalapalli, Apr 10 '19 at 01:06
Please check, I have updated ioping stats on the servers, the ping stats show some pings as high as 5ms, trying to investigate why it is so with storage team and also trying to understand if this might have a siginificant drop in IO bandwidth. Please do share your thoughts if this is something to consider while investigating the issue we are experiencing. — Kiran Thummalapalli, Apr 10 '19 at 11:27
@KiranThummalapalli - Sorry, but the leading % forces the full table scan, which is costly. A few tweaks may be possible, but it would be hard to speed up those queries by more than 2x. — Rick James, Apr 22 '19 at 04:28

Brennen Smith · Answer 1 · 2019-04-18T04:20:02.010

1

The issue is that your message field is a varchar() and you are performing a full text search across a non-indexed column - with your query, the leading % causes a full text scan of the entire column of data as MySQL can only use the prefix of data before the first wildcard for traditional BTREE indices.

Doing full text searches with MySQL isn't a great use case, but if you are determined to use MySQL, take a look at a FULLTEXT index - this will enable MySQL to efficiently query the data without a full table scan.

https://dev.mysql.com/doc/refman/8.0/en/innodb-fulltext-index.html

No amount of InnoDB tuning will fix a schema or query design issue.

edited Apr 18 '19 at 04:20

answered Apr 18 '19 at 03:05

Brennen Smith

1,638
7
11

Thank you for your inputs, would like to briefly, explain the problem we are presently experiencing. I fully understand the query/table schema needs to be relooked to improve the overall efficiency of the query being executed. But the issue at hand is we are seeing iowait in excess of 20-30% while IO bandwidth used is hardly 25-30 MBps, CPU utlisation or RAM usage is also not a concern(Pls take a look at the top and other stats posted already), we are trying to understand what is causing MySQL not use the full IO bandwidth and instead get throttled while there is abdundant IO availalble. – Kiran Thummalapalli Apr 18 '19 at 05:10
I will try to find other queries that are fully optimised but still give similar result as above and share if feasible. Hope I am able to clarify on the problem better. Do share your thoughts. – Kiran Thummalapalli Apr 18 '19 at 05:10
What RAID topology are you using? Is it a parity based RAID? Looking at iostat, you are hammering particular drives and not spreading IO across the entire array - for every drive where the dm2 is at 99%, there is a corresponding drive that is also at 99%, despite the rest of the array not being under heavy load. Overall, I'd definitely re-index the table and potentially re-raid the server to a more performant topology for database workloads. – Brennen Smith Apr 18 '19 at 16:26
We are on a private cloud(VM on ExSi) with the Storage being a SAN. We are able to achieve over 500-600 MBps when doing simple Copy commands the file system. But for some reason, MySQL is not able to spread out IO. Do you have any suggestions on what might be throttling MySQL ? – Kiran Thummalapalli Apr 18 '19 at 20:44

Wilson Hauck · Answer 2 · 2019-04-18T11:14:44.497

0

Suggestions to consider for your instance to improve MySQL IOPS,

SET GLOBAL innodb_io_capacity_max=18000  # from 2000
SET GLOBAL innodb_io_capacity=9000  # from 200

for testing and to be applied to my.cnf [mysqld] section before next stop/start of services.

Disclaimer: I am the content author of web site mentioned in my profile, Network profile and could provide additional suggestions.

Apr 18, 2019 because SET GLOBAL variable_name=value applies to 'new connections', please wait 1 hour minimum to review impact on io wait time. If your processes typically take 3 hours, wait three hours to review impact, please.

edited Apr 18 '19 at 11:14

answered Apr 08 '19 at 12:20

Wilson Hauck

426
4
10

1

Thank you for the suggestion, we tried this today, but found no change in the iowait. Tomorrow we will be able test this again with some more stats and some more usecases, as its a production system. Will update back tomorrow with more findings. – Kiran Thummalapalli Apr 08 '19 at 17:49
Please post text results from OS iostat -xm 5 3; when under load on your production server. Thanks – Wilson Hauck Apr 08 '19 at 18:05
1

Have added the iostat during load time, also we checked after the setting suggested by you we found no difference in iotop Total read/write also. Also the innodb_io_capacity was a dynamic setting so did not restart but have set it using set global and it did get updated for testing. – Kiran Thummalapalli Apr 09 '19 at 11:19
1

Yes we did check the values were applied as expected but had no impact on the overall IO usage. Do you suspect is there any other parameter or configuration that is causing this throttling of IO ? – Kiran Thummalapalli Apr 10 '19 at 01:03
@KiranThummalapalli Post on pastebin.com and share the links. A) complete (not edited) my.cnf or my.ini Text results of: B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) complete MySQLTuner report AND Optional very helpful information, if available includes - htop OR top OR mytop for most active apps, ulimit -a for a linux/unix list of limits, iostat -xm 5 3 for IOPS by device and core/cpu count, df -h for a linux/unix free space list by device, for basic server tuning analysis. – Wilson Hauck Apr 10 '19 at 15:21
1

I have updated the info requested from your end, and updated links also. Do let me know if you need any more info. – Kiran Thummalapalli Apr 16 '19 at 07:23
1

Thanks for pointing out, I have updated the link with SHOW GLOBAL VARIABLES results now Kindly recheck the same link again. – Kiran Thummalapalli Apr 17 '19 at 05:55
We have taken care of using a fresh connection to do tests after setting innod_io_capacity variables also we did a retest yesterday to make sure there is no change in IO Wait and it did not have any impact it was still the same as we had before making changes to above variables couple of days back. – Kiran Thummalapalli Apr 20 '19 at 04:46
@KiranThummalapalli Awaiting your contact for meaningful discussion. The SHOW GLOBAL VARIABLES; provided indicated NOTHING had been changed related to innodb_io_capacity_max or innodb_io_capacity, so you were still throttled down to the limits of 10 years ago prior to widespread use of SSD. The suggestions are for PRODUCTION, not TEST. A TEST instance will not likely have any of the operating characteristics of a production instance. – Wilson Hauck Apr 20 '19 at 08:23
apologies again for the wrong global variables posted have updated the same also with the correct one : https://pastebin.com/9edrGmaL Also we are making changes in production servers ,we have two servers in Master-Master configuration, making changes in one of the master and rerunning the queries to see any impact on iowait etc. – Kiran Thummalapalli Apr 23 '19 at 09:35

score 0 · Answer 3 · edited Apr 18 '19 at 02:44

Rate Per Second=RPS - Suggestions to consider for your my.cnf [mysqld] section. These are all Dynamic variables and may be set with SET GLOBAL global_name=Value;

read_rnd_buffer_size=262144  # from 8M to reduce handler_read_rnd_next RPS of 22,915
read_buffer_size=262144  # from 2M to reduce handler_read_next RPS of 52,015
innodb_lru_scan_depth=100  # from 1024 to conserve 90% of CPU cycles used for function
innodb_flushing_avg_loops=5  # from 30 to reduce innodb_buffer_pool_pages_dirty count of 17,452
innodb_change_buffer_max_size=15  # from 25 percent set aside from innodb_buffer_pool_size

Disclaimer: I am the author of web content for site named in my profile, Network profile that includes contact information.

You may use SHOW GLOBAL STATUS LIKE '%dirty%'; hourly to review reduction of innodb_buffer_pool_dirty_pages count that will have a positive influence on reducing READS RPS for innodb data tables.

Thank you for your suggestions, we will review and try to implement these and share the impact of the same shortly. — Kiran Thummalapalli, Apr 18 '19 at 05:11
https://pastebin.com/3jnTz486 innodb_buffer_pool_dirty_pages every one hour info. — Kiran Thummalapalli, Apr 23 '19 at 09:26

Rick James · Answer 4 · 2019-04-22T17:41:18.957

Are you only doing COUNTs? If not, please show us the real queries. There are shortcuts for making COUNT fast; we should not discuss them unless that is the real goal.

SELECT  COUNT(msgid)
    FROM  `2018-10-30`
    WHERE  priority=1
      AND  (message LIKE '%596f7572204f6e652054696d652050494e20%'
              OR  message LIKE '%4f545020666f7220%'
              OR  message LIKE '%4f545020616e642072656620%'
              OR  message LIKE '%4f545020746f20%'
           );

Don't say COUNT(msgid), that implies testing msgid for being NOT NULL. Simply say COUNT(*).

Both the leading wild card and the OR are performance killer. However, changing to a REGEXP might speed it up some. Currently, the message column is scanned up to 4 times. With the following, it is all done "all at once":

AND message REGEXP '596f7572204f6e652054696d652050494e20|4f545020666f7220|4f545020616e642072656620|4f545020746f20'

priority now takes 4 bytes; wouldn't TINYINT suffice?

Another possible speed up is to have a covering index (and use `COUNT(*)):

INDEX(priority, message, ISDFlag, msgsubmitid)

Then any of the queries you presented will have less stuff to scan through. They will scan the index's BTree instead the much wider Data's BTree.

IOPS

Blocks of the table(s) are cached in the buffer_pool. When that gets full, there is I/O go kick out some blocks and bring in other blocks. If the main table is really big, you can expect I/O. If it fits in the buffer_pool you might see no I/O, even for a table scan.

UNHEX -- It looks like message is all hex. If it is, use UNHEX() and HEX() and declare the column to be VARBINARY(250) to be half the size. Again, smaller --> less I/O (when the table is too big to be cached).

innodb_buffer_pool_size -- what is the value of that? With 96GB of RAM, it should be about 75G. How big is the table? Are there a lot of tables like this one? I'll bet there is, based on the awkward table name.

What happens if the same message occurs on two different days?

Analysis of VARIABLES and STATUS

Observations:

Version: 5.7.20-log
94.2 GB of RAM
Uptime = 30d 02:56:40
You are not running on Windows.
Running 64-bit version
You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Some suggested VARIABLE changes:

innodb_page_cleaners = 16
innodb_buffer_pool_instances = 16
innodb_lru_scan_depth = 256
innodb_read_io_threads = 8
innodb_write_io_threads = 8
long_query_time = 2

Very few SELECTs? So this is mostly "write-only"? Are inserts batched?

Although innodb_log_file_size is smaller than it should be for the heavy write activity, it is probably not worth the effort to chang now. (5G would be better than 2G.)

If you are using SSDs, you may as well turn off innodb_flush_neighbors.

Your I/O seems to be handling more than innodb_io_capacity = 200 indicates; raise it. Suggest 500.

Is replication involved? Is the machine a Slave? Some of the values relating to replication seems strange, in particular: slave_skip_errors is set to a strange value. Are you 'sweeping problems under the rug'?

Nearly half of SELECTs do a table scan. This is quite high, and perhaps needs looking into. OTOH, the number of selects is quite low.

Replacing a Stored Routine about 4 times a day is rather high.

SHOW TABLES is happening 2-3 times per second -- can't that be avoided?

Details and other observations:

( Innodb_buffer_pool_reads ) = 1,006,346,347 / 2602600 = 386 /sec -- InnoDB buffer_pool I/O read rate -- check innodb_buffer_pool_size

( Innodb_buffer_pool_pages_flushed ) = 513,074,244 / 2602600 = 197 /sec -- Writes (flushes) -- check innodb_buffer_pool_size

( Key_blocks_used * 1024 / key_buffer_size ) = 19 * 1024 / 1024M = 0.00% -- Percent of key_buffer used. High-water-mark. -- Lower key_buffer_size to avoid unnecessary memory usage.

( table_open_cache ) = 32,163 -- Number of table descriptors to cache -- Several hundred is usually good.

( innodb_buffer_pool_size / innodb_buffer_pool_instances ) = 71680M / 8 = 8960MB -- Size of each buffer_pool instance. -- An instance should be at least 1GB. In very large RAM, have 16 instances.

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners

( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5 -- innodb_page_cleaners -- Recommend setting innodb_page_cleaners to innodb_buffer_pool_instances

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((1006346347 + 513074244) ) / 2602600 = 583 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size?

( Innodb_os_log_written ) = 7,973,352,303,616 / 2602600 = 3063610 /sec -- This is an indicator of how busy InnoDB is. -- Very idle or very busy InnoDB.

( Innodb_log_writes ) = 1,662,275,231 / 2602600 = 638 /sec

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 7,973,352,303,616 / (2602600 / 3600) / 2 / 2048M = 2.57 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 2,602,600 / 60 * 2048M / 7973352303616 = 11.7 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)

( Com_rollback ) = 760,459 / 2602600 = 0.29 /sec -- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.
-- (OTOH, Com_rollback is very low compared to Com_commit.)

( Innodb_dblwr_writes ) = 29,374,160 / 2602600 = 11 /sec -- "Doublewrite buffer" writes to disk. "Doublewrites" are a reliability feature. Some newer versions / configurations don't need them. -- (Symptom of other issues)

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read + Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 1021090541 + 513083786 ) / 2602600 / 200 = 294.7% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( sync_binlog ) = 0 -- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster.

( innodb_thread_concurrency ) = 0 -- 0 = Let InnoDB decide the best for concurrency_tickets. -- Set to 0 or 64. This may cut back on CPU.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( bulk_insert_buffer_size / _ram ) = 8M / 101146479820.8 = 0.01% -- Buffer for multi-row INSERTs and LOAD DATA -- Too big could threaten RAM size. Too small could hinder such operations.

( (Queries-Questions)/Queries ) = (5936481203-59444814)/5936481203 = 99.0% -- Fraction of queries that are inside Stored Routines. -- (Not bad if high; but it impacts the validity of some other conclusions.)

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (2193033569 + 372855165 + 1643709242 + 0) / 1660237104 = 2.54 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_scan ) = 7,124,788 / 2602600 = 2.7 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 7,124,788 / 15138927 = 47.1% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (2193033569 + 1643709242 + 0 + 0 + 372855165 + 571) / 2602600 = 1617 /sec -- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives

( expire_logs_days ) = 3 -- How soon to automatically purge binlog (after this many days) -- Too large (or zero) = consumes disk space; too small = need to respond quickly to network/machine crash. (Not relevant if log_bin = OFF)

( slave_pending_jobs_size_max / max_allowed_packet ) = 16M / 4M = 4 -- For parallel slave threads -- slave_pending_jobs_size_max must not be less than max_allowed_packet

( slave_skip_errors ) = slave_skip_errors = 1 03 21 05 41 062 -- Which error cases to ignore -- It would be better to change the code than sweep problems under the rug.

( long_query_time ) = 10 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

Abnormally small:

(Com_select + Qcache_hits) / (Com_insert + Com_update + Com_delete + Com_replace) = 0.0036

Abnormally large:

( Innodb_pages_read + Innodb_pages_written ) / Uptime = 589
Binlog_cache_use = 635 /sec
Com_begin = 633 /sec
Com_commit = 637 /sec
Com_commit + Com_rollback = 638 /sec
Com_create_procedure = 0.15 /HR
Com_dealloc_sql = 39 /HR
Com_delete = 631 /sec
Com_do = 2.3 /HR
Com_drop_procedure = 0.15 /HR
Com_execute_sql = 42 /HR
Com_insert = 842 /sec
Com_load = 0.79 /HR
Com_prepare_sql = 42 /HR
Com_rename_table = 0.05 /HR
Com_show_create_proc = 2.9 /sec
Com_show_tables = 2.6 /sec
Com_show_warnings = 92 /HR
Com_signal = 4.6 /HR
Com_slave_start = 0.0014 /HR
Com_slave_stop = 0.0014 /HR
Com_update = 143 /sec
Handler_commit = 4507 /sec
Handler_delete = 635 /sec
Handler_prepare = 4503 /sec
Handler_update = 2978 /sec
Innodb_buffer_pool_pages_data = 4.51e+6
Innodb_buffer_pool_pages_total = 4.59e+6
Innodb_buffer_pool_write_requests = 27091 /sec
Innodb_data_read = 6426970 /sec
Innodb_data_reads = 392 /sec
Innodb_data_writes = 848 /sec
Innodb_data_writes - Innodb_log_writes - Innodb_dblwr_writes = 198 /sec
Innodb_data_written = 9506808 /sec
Innodb_dblwr_pages_written = 196 /sec
Innodb_log_write_requests = 4670 /sec
Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 5,259.0MB
Innodb_pages_created = 30 /sec
Innodb_pages_read = 392 /sec
Innodb_pages_read + Innodb_pages_written = 1.53e+9
Innodb_pages_written = 197 /sec
Innodb_rows_deleted = 635 /sec
Innodb_rows_deleted + Innodb_rows_inserted = 1945 /sec
Innodb_rows_inserted = 1310 /sec
Innodb_rows_updated = 663 /sec
Max_execution_time_set = 16
Ongoing_anonymous_transaction_count = 1
Open_tables = 4,129
Performance_schema_digest_lost = 2.23e+7
Select_range / Com_select = 49.2%
Threads_cached = 315
auto_increment_offset = 2
innodb_open_files = 32,163
port = 3317
report_port = 3317

Abnormal strings:

Ssl_session_cache_mode = Unknown
event_scheduler = ON
have_ssl = YES
have_symlink = DISABLED
innodb_data_home_dir = /var/lib/mysql
innodb_fast_shutdown = 1
optimizer_trace = enabled=off,one_line=off
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on
session_track_system_variables = time_zone, autocommit, character_set_client, character_set_results, character_set_connection
slave_compressed_protocol = ON
slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN

Alas, I did not spot anything in the Analysis that would clearly help with the Question at hand. — Rick James, Apr 22 '19 at 17:43
Thank you @Rick James, for the inputs, we will review and see if we could implement any of these to improve the situation at hand. `Global variables` was mistakenly shared, have updated the link to the current one: and it has
`innodb_io_capacity 9000`
`innodb_io_capacity_max 18000` — Kiran Thummalapalli, Apr 23 '19 at 09:37
Very few SELECTs? So this is mostly "write-only"? Are inserts batched? We have a Master-Master server configuration, all load is presently on one of the masters, and we are expermenting on the second server on which production traffic is loaded but Sync enabled so mostly inserts and updates from the other servers will propogate here, while we try to resolve the IO wait issue, once that is resolved we will implement changes on the other master also with production traffic. — Kiran Thummalapalli, Apr 23 '19 at 09:41
If you are using SSDs, you may as well turn off innodb_flush_neighbors. We are presently connected to a SAN, we have requested info on type of disks which most presumably are SSDs but awaiting confirmation, being at customer end, we have limited access to physical info of the storage as on cloud we are just been presently allocated requested space. Based on the IOPS we have seen it hits above 20K and averaging at 9-10K. — Kiran Thummalapalli, Apr 23 '19 at 09:43
@KiranThummalapalli - "Write-only" -- yeah that makes sense. "Neighbors" -- I do not know whether neighbors should be on or off for a SAN, even if it uses SSDs/Flash. It is probably heavily RAIDed and cached. — Rick James, Apr 23 '19 at 14:36

MySQL on VM, High IO Wait but low TPS

4 Answers4