1

I would like to monitor disk I/O activity in case something courses heavy I/O. This Nagios plugin seams simple, where he gives examples of

Example: Tps, read and write thresholds:
    ./check_diskstat.sh -d sda -w 200,100000,100000 -c 300,200000,200000

Example: Average queue length threshold:
    ./check_diskstat.sh -d sda -W 50 -C 100

Question

Since different hosts will have different idle I/O activity, how can I find good starting values to use?

I guess another way to ask the same would be: Which iostat arguments should to use on each hosts to see what its "good state" disk I/O is?

Sandra
  • 9,973
  • 37
  • 104
  • 160
  • I monitored the the load avg that include the iowait, if you have have a high IOWAIT you get a big loadavg – c4f4t0r May 24 '18 at 14:36

1 Answers1

1

The underlying counters are documented in https://www.kernel.org/doc/Documentation/block/stat.txt

Setting meaningful thresholds based on the absolute number of IOPs and sectors read from or written to a block device (the lowercase -w and -c options) requires a priori knowledge of the actual capabilities of that particular block device (for instance by benchmarking them).

Using the queue length (the UPPERcase -W and -C options) seems a bit more universal. When you get an increased IO queue that is bad, regardless of how fast the underlying storage is, you're pushing more reads/writes than it can support and your applications will slow down.

I have no idea though if the documented 50 and 100 milliseconds are reasonable or completely arbitrary values.


For my virtual servers using absolute numbers is relatively easy, they are provisioned in flavors with specific limits and I would only need to set the the warning/critical levels at for instance 80% respectively 95% of those assigned limits.

For example with a flavor 600 IOPS and 10 MB/s:

Divide the assigned disk_read_bytes_sec and disk_write_bytes_sec by 512 (the sector size) to get the limits in sectors the virtual disk will support. (10 MB = 10000000 bytes) / 512 = 19531
19531 * 80% = 15624 and 600 * 80% = 480
19531 * 95% = 18749 and 600 * 95% = 570

./check_diskstat.sh -d vda -w 480,15624,15624 -c 570,18749,18749
HBruijn
  • 72,524
  • 21
  • 127
  • 192