Strange Linux disk statistics on EBS i/o credit exhaustion

Question

We have T2 instances (Linux 4.9.20-11.31.amzn1.x86_64) on AWS EC2 which exhaust their i/o credits due to disk reads. It may well be that we have excessive reads on these nodes, so nothing strange about this in itself, but the result on the processes on the node is rather peculiar. atop (v 1.27) captures a normal, expected flow of small reads until the i/o credits are exhausted, when atop -d 30 starts looking like this for long periods on end:

  PID   TID  RDDSK  WRDSK WCANCL  DSK CMD
10616     - 432.2M     0K     0K  24% consul
27629     - 313.3M     0K     0K  17% chef-client
27795     - 306.5M     0K     0K  17% python
27803     - 132.6M     0K     0K   7% crond

It seems unlikely that consul or crond (and in other samples named, dhclient and even init) suddenly decided that it wanted to read hundreds of MBs, having previously read very little for hours on end. This behavior goes on for about an hour and various processes show up with 100+ MB read over this period.

What can explain these high numbers for normally well-behaved processes? I thought atop read /proc/X/io read_bytes for these numbers which should be reasonably accurate for actual EBS activity?

@Michael-sqlbot nodes have no swap and appears not to use a lot of memory during these operations. — Bittrance, Oct 03 '17 at 15:12

Strange Linux disk statistics on EBS i/o credit exhaustion

0 Answers0