What creates CPU I/O wait but no disk operations?

Question

I have CPU I/O wait steady around 50%, but when I run iostat 1 it shows little to no disk activity.

What causes wait without iops?

NOTE: There no NFS or FUSE filesystems here, but it is using Xen virtualization.

enter image description here

Also: is this a the Xen hyper visor machine or a VM with the iowaits? — ZaMoose, Mar 07 '12 at 23:56

score 7 · Answer 1 · answered Mar 07 '12 at 23:43

7

NFS can do this, and it wouldn't surprise me if other network filesystems (and even FUSE-based devices) had similar effects.

answered Mar 07 '12 at 23:43

womble

95,029
29
173
228

Thanks, but in this case there's no NFS and no FUSE. I'll add that to the question too. – Jason Cohen Mar 07 '12 at 23:51

score 6 · Answer 2 · answered Mar 07 '12 at 23:59

6

Is there any chance other VMs on the server are thrashing the disk?

I know with virtualisation that you can get some strange results if the host node is overloaded.

answered Mar 07 '12 at 23:59

lbft

91
4

True but that should be in steal% instead of io% right? Or can it cross over there too? – Jason Cohen Mar 08 '12 at 00:10
3

Steal happens when there's less CPU capacity available than requested by the VMs. If the physical disk is overloaded, your processes are going to spend a lot of time in iowait waiting for their turn at the disk even if they're not hitting the disk much. – lbft Mar 08 '12 at 00:21
Yeah, this. See another question with the same answer at http://serverfault.com/a/209031/57468 – mattdm Mar 08 '12 at 00:47

score 3 · Answer 3 · answered Mar 08 '12 at 00:02

If this is the Amazon EC2 Xen environment using instance-based storage, ask Amazon to check the health of the host containing this image.

If this is a Xen environment that you can gain access to the hypervisor, then check the IOwait from without for the disk image (file, network, LVM-slice, whatever) being used for the xvda and xvdb devices. You'll also want to check the I/O system, in general, for the hypervisor since other disk devices might be monopolizing the system's resources.

iostat -txk 5

is usually a good starting diagnostic tool. It takes 5-second summaries of I/O for ALL devices available to it, and thus is useful both with-in and wither-out the VM image.

Sonassi · Answer 4 · 2012-03-08T00:05:45.137

2

Check your available file descriptors / inodes. When you hit the limit, they swap and mimic iowait

Edit

I saw you are using xen, have a look at your current interrupts, you might find blkif is higher than normal.

Bit late now, but get munin installed and it will really help future debugging.

edited Mar 08 '12 at 00:05

answered Mar 07 '12 at 23:53

Sonassi

21
2

score 2 · Answer 5 · answered Mar 08 '12 at 02:50

2

sudo sysctl vm.block_dump=1

Then check dmesg to see what is performing block read / writes or dirtying inodes.

Also check nofile limit in limits.conf, a process could be requesting more files than it is permitted to open.

answered Mar 08 '12 at 02:50

neal

161
1
4

score 1 · Answer 6 · answered Mar 08 '12 at 11:07

WARNING: HDPARM IS DANGEROUS, ALWAYS READ ABOUT THE COMMAND YOU ARE GOING TO USE!

If no other virtual machines are stressing the hard disk(s), do

hdparm -f

on the underlying physical disk(s). Possibly the disk cache don't work accurately. This will flush the data stored in the cache, and you can constantly monitoring the I/O, whether it is about to rise again after the flush. If yes, it will be a cache problem.

score 0 · Answer 7 · answered Mar 07 '12 at 23:49

0

With load average, I've seen blocked networking operations (i.e. long calls to an external DB server) increase. I don't know for sure but I'm guessing network IO can cause CPU wait to go up? Can anyone confirm?

answered Mar 07 '12 at 23:49

Ryan Allen

1

1

In most modern machines, no. Most, if not all recent systems have DMA-capable NICs to prevent precisely this sort of situation. – ZaMoose Mar 07 '12 at 23:54

score 0 · Answer 8 · answered Mar 07 '12 at 23:50

0

Could be loopback devices, that are themselves mounted over the network.

answered Mar 07 '12 at 23:50

Egdares Futch

1

score 0 · Answer 9 · answered Mar 08 '12 at 07:37

On my machines NFS is the biggest IO-WAIT "producer". I have a SSD in my laptop which is fast as hell, so "real IO" is not the problem. Nevertheless I have sometimes lots of IO wait due to my mounted nfs shares.

SCP sometimes also seems to lead to IO Wait but to a far lesser extend.

score 0 · Answer 10 · answered Mar 08 '12 at 11:56

This can be anything. It just means that something is waiting for end of I/O operation. You can figure out what process it is via ps, then attach gdb to it and check out backtrace to determine which call is hang (usually this is some network-related stuff or suddenly disconnected disk). For fd info, check out /proc.

score 0 · Answer 11 · answered Sep 15 '14 at 21:20

I've also experienced a similar problem right before a disk in a RAID failed and some SATA cables with tight bends in them started failing.

The CPU usage was near 0%, but 1 or more CPU's on a 4-core system were spending 100% of their time in IOwait for extended periods of time (found via top multi-line cpu display) with very low IOps and bandwidth (found via iostat), but bursty high interrupt activity. Interactive command-line use was painful during any disk access (i.e. auto-save from someone's emacs session) but otherwise tolerable once the periods of IOwait passed (and presumably the operations succeeded after many retries).

What creates CPU I/O wait but no disk operations?

11 Answers11

WARNING: HDPARM IS DANGEROUS, ALWAYS READ ABOUT THE COMMAND YOU ARE GOING TO USE!