2

What are some ways to debug I/O issues on a linux server?

I've been using:

# nohup top -b -d 10 > /var/log/top.log &
# nohup iotop -b -d 5 -o -t > /var/log/iotop.log &

PS: hardware is clean, new and fine.

SWAP is not being used at all and I see a lot of:

[jbd2/sda6-8]
[jbd2/sda2-8]
[loop0]
[loop1]
[events/0]
[flush-8:0]
[kondemand/3]
[ksoftirqd/3]
[kblockd/2]

The server will be fine for most of the time then it will randomly spike into 6.00~38.00 Load Average.

All I have on the box is PHP/Apache/nginx.

Example:

    top - 03:25:11 up 1 day,  5:00,  3 users,  load average: 6.87, 2.98, 1.90
Tasks: 224 total,   1 running, 222 sleeping,   0 stopped,   1 zombie
Cpu0  :  4.7%us,  1.0%sy,  0.0%ni, 21.3%id, 73.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 15.0%us,  2.3%sy,  0.0%ni, 60.0%id, 22.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  6.7%us,  1.7%sy,  0.0%ni,  0.0%id, 91.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni, 91.1%id,  8.6%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   8031932k total,  7971176k used,    60756k free,   231236k buffers
Swap:  8191992k total,        0k used,  8191992k free,  6334420k cached

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
   2231 mysql     20   0 2576m 537m 6348 S  3.0  6.9  66:35.85 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --us
 678511 user 20   0  245m  43m  20m D  1.0  0.6   0:01.08 /usr/bin/php
 678539 user 20   0  255m  49m  21m D  0.7  0.6   0:00.33 /usr/bin/php
 678551 user 20   0  230m  14m 8392 D  0.7  0.2   0:00.08 /usr/bin/php
 678565 user 20   0  231m  17m  10m D  0.7  0.2   0:00.08 /usr/bin/php
     36 root      20   0     0    0    0 S  0.3  0.0   1:04.45 [kblockd/2]
     60 root      20   0     0    0    0 S  0.3  0.0   0:51.02 [kswapd0]
   1653 root      20   0     0    0    0 S  0.3  0.0   0:54.87 [kondemand/2]
   3394 root      20   0  353m 3480 1496 S  0.3  0.0   7:26.66 /usr/sbin/db_governor
 494915 nobody    18  -2 61104  19m  988 S  0.3  0.2   0:38.74 nginx: worker process
 678473 nobody    20   0 96912  13m 2304 S  0.3  0.2   0:00.04 /usr/local/apache/bin/httpd -k start -DSSL
 678474 nobody    20   0 96904  13m 2304 S  0.3  0.2   0:00.04 /usr/local/apache/bin/httpd -k start -DSSL
 678480 user 20   0  229m  17m  10m S  0.3  0.2   0:00.22 /usr/bin/php
 678491 root      20   0 15148 1360  944 R  0.3  0.0   0:00.15 top -c
 678519 user 20   0  233m  30m  20m D  0.3  0.4   0:00.22 /usr/bin/php
 678538 user 20   0  234m  31m  20m D  0.3  0.4   0:00.18 /usr/bin/php
 678567 user 20   0  230m  14m 8392 D  0.3  0.2   0:00.06 /usr/bin/php
 678612 user 20   0  128m 6156 4392 D  0.3  0.1   0:00.01 /usr/bin/php
      1 root      20   0 19356 1388 1064 S  0.0  0.0   0:00.89 /sbin/init

and ittop

66913 be/4 user 1733.28 K/s    0.00 B/s  0.00 % 99.99 % php
66888 be/4 user 734.51 K/s    0.00 B/s  0.00 % 99.99 % php
66275 be/4 user 167.11 K/s    0.00 B/s  0.00 % 99.99 % php
66409 be/4 user 956.03 K/s    0.00 B/s  0.00 % 99.99 % php
66840 be/4 user 15.55 K/s    0.00 B/s  0.00 % 99.99 % php
66825 be/4 user 85.50 K/s    0.00 B/s  0.00 % 99.99 % php
66902 be/4 user 2028.64 K/s    0.00 B/s  0.00 % 99.99 % php
66268 be/4 user 932.71 K/s    0.00 B/s  0.00 % 99.95 % php
66805 be/4 user 489.67 K/s    0.00 B/s  0.00 % 93.08 % php

This is what randomly will spike.

Ideas?

Tiffany Walker
  • 6,541
  • 13
  • 53
  • 77

1 Answers1

1

Thanks for the question.

It would be helpful to have detailed information about the hardware you're using.

That includes the server make/model, the disk array setup (RAID controller, RAID level, caching solution, # of disks) and the details of your Linux distribution and kernel.

Looking at the data dump above, I suspect I/O wait from write activity that is starved or waiting for resources. That can happen when there's no write cache available on the disk array. That can also be the cause of the wild swings in load.

The output of a tool like iostat or collectl will be more helpful in understanding what's happening.

Try iostat -x 1 or collectl -sD and post the result.

ewwhite
  • 194,921
  • 91
  • 434
  • 799