3

My database server has the following sar output for the data device:

[postgres@dbsrv07 ~]$ LC_ALL=POSIX sar -d  |egrep "await|dev253-2"

00:00:01          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util

00:10:01     dev253-2   2721.27  18357.23  20291.52     14.20    613.68    225.51     0.15     40.60

00:20:01     dev253-2   1345.04    574.92  10685.38      8.37    290.65    215.99      0.06      8.61

00:30:01     dev253-2    801.39    193.53   6364.92      8.18     87.49    109.34      0.07      5.95

00:40:01     dev253-2    832.95    195.70   6617.82      8.18     89.30    107.20      0.07      5.87

00:50:01     dev253-2    835.58    162.90   6644.64      8.15     85.35    102.14      0.06      5.24

01:00:01     dev253-2    847.99    232.36   6722.90      8.20     89.91    106.03      0.07      5.64

01:10:01     dev253-2   2240.78   2295.28  17543.52      8.85    163.37     72.91      0.10     23.06

01:20:01     dev253-2   2706.18   1358.97  21482.68      8.44    175.98     65.00      0.08     20.73

01:30:01     dev253-2   5839.31   3292.69  45960.39      8.43    520.98     89.19      0.07     42.24

01:40:01     dev253-2   5221.88   1945.32  41384.97      8.30    553.92    106.05      0.06     33.85

The high await persists throughout the day.

Am I right in assuming that this indicates an I/O bottleneck?

Thanks

xpapad
  • 151
  • 1
  • 1
  • 4

3 Answers3

6

svctm is a measure of how long the storage took to respond after the command left the IO scheduler and the IO was no longer under the kernel's control. You're seeing less than 1ms here which is excellent.

await is a measure of how long a given IO spent in the entire IO scheduler. You're seeing hundreds of milliseconds here which is pretty bad. Different people/vendors have different ideas about what is "good", I'd say under 50ms is good.

If your physical storage was slow, you'd see a large svctm and a large await. If the kernel's IO is slow, you'll see a large await but small svctm.

What IO scheduler are you using to this device? Given the small IO size (8kb) you care more about latency of requests than about bulk throughput. You'd probably be best off using the deadline scheduler, as opposed to the default cfq scheduler.

This is done by putting elevator=deadline on the kernel line in grub.conf and rebooting.

Also, given that you have hundreds of IOs backed up in the queue (avgqu-sz), and you're getting into thousands of IOPS (tps), and I'd assume that these are database IO which is likely to be directio so they cannot be merged into larger requests or take advantage of the pagecache, you may just be expecting too much from the storage subsystem.

suprjami
  • 3,476
  • 20
  • 29
  • 1
    Careful reading too much into `svctm`. There appears to be reason to believe that it in fact is not reliable on Linux. See: http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/ – Trott Dec 11 '13 at 22:02
1

Almost (:-))

await is a combination of service time and wait time (latency), where you are really concerned about wait time. If your service time is on the order of 10 milliseconds, things are getting slow when the wait is as big as the service time.

10 ms is a good service time for a Sun disk array: I don't know what is a good time for your disk, but I sorta suspect you're seeing an I/O bottleneck.

--davecb@spamcop.net

davecb
  • 211
  • 2
  • 5
1

From superjami's comment, it looks like you have a bottleneck "above" the disk/array. I'd enquire of the postgres community what they recommend in the way of scheduling. In my days in Solaris, we would have used the "cray" scheduler table for a machine that was primarily a database engine...

--dave

davecb
  • 211
  • 2
  • 5