4

I have a monitoring server that is running icinga/collectd/graphite with about 50 hosts. I have noticed high load/slugging performance on the box. If you take a look at top, you'll see:

Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 7.6%id, 23.4%wa, 0.0%hi, 0.2%si, 0.0%st

Notice the HUGE %wa value, which as far as I know means a network or disk bottleneck. ifconfig shows no dropping packets and there's not a ton of bandwidth going on, so that leaves disk issues, right? There's not a lot of disk writing going on either...iotop is reporting we're only writing a little over 1 MB per second and the RAID tool reports everything is A-OK and write caching is enabled.

How do I go about trying to figure out how to fix this?

UPDATE: iostat -x output is:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.62    0.10    0.31    9.65    0.00   89.31

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.21    33.34   83.55   16.54  1599.94   399.07    19.97    43.21  416.98   3.71  37.13
Will
  • 816
  • 2
  • 9
  • 17

2 Answers2

2

i/o wait is also generated by NFS, SMB and other remote filesystems.

Use vmstat 2 to see a granular view of system performance including io wait.

Alastair McCormack
  • 2,184
  • 13
  • 22
1

High wa generally does mean the OS is either waiting for network or disk. There is quite a nifty program called iotop. This tells you what the disk is up to, might be some help.

Sc0rian
  • 1,011
  • 7
  • 16
  • As I said in the post, I ran iotop and it's only doing 1 MB/s of total disk I/O. – Will Jun 07 '12 at 13:54
  • ah sorry you did I missed that. are the drives in good health? Have you had a look at smartctl? Also try doing a bandwidth test on the network, iperf is a good program. Make sure you have no network bottlenecks. – Sc0rian Jun 07 '12 at 13:56
  • 1
    1mb per second is a useless indicator - like trying to find out whether the engine in a car is powerfull enough by looking at the tachometer in a city. I have a 8 disc RAID 10 Veloricaptors that at times only pulls 4-6 mb/s and is FULLY BUSY. Check something like IO wait stats if available - but MB is irrelevant, if the head moves a lot, that may be all that the disc gives you. – TomTom Jun 07 '12 at 14:39