Debugging Linux I/O latency

I'm having some I/O problems on a couple of Linux systems that I administer. They manifest in that processes often block for up to several seconds in such simple syscalls as open(), unlink() or close() on files (which is a problem because some of the involved programs need rather low I/O latency to operate properly). It is true that the systems in question to experience some moderate I/O load, but I can hardly think it would be enough to justify such enormous latency times. Sometimes, the calls can take more than 15 seconds to complete (though more often they might take 1 or 2 or 3 seconds or so).

My question is: How can I find out why this happens? What I would like is some tool that could tell me what the processes in question are blocked by in the kernel, and why that which they sleep on is busy, what is happening with it, and such things. Is there such a tool, or is there some other way of trying to debug what happens?

Alternatively, of course, if you have any clue as to what actually is happening, how can it be avoided?

For the record, the filesystem I use is XFS.

Dolda2000

Posted 2012-03-04T06:19:31.517

Reputation: 1 083

Answers

Now in due time, I have managed to solve this myself, so I can at least follow up on it myself for posterity.

Unfortunately, I lost the original problem in a kernel upgrade, but gained a new one instead, even worse in performance, and just as hard to track down. The techniques I found were the following:

First of all, blktrace/blkparse is a tool that I found quite helpful. It allows the tracing of the progress of individual I/O requests with many helpful details, such as the process that submitted the request. It is helpful to put the output on tmpfs, so that the handling of the storage of the trace doesn't start tracing itself.

That helped only so far, though, so I compiled a kernel with more debugging functionality. In particular, I found ftrace quite helpful, since it allowed me to trace the poorly performing process inside kernel space, to see what it did and where it blocked. Compiling a debug kernel also provides working WCHAN output for ps as well, which can work as an easier way to see what a process is doing inside the kernel, at least for simpler cases.

I was also hoping for LatencyTop to be useful, but I found it quite buggy, and also that it only displayed latency reasons that were too "high-level" to be truly useful, unfortunately.

Also, I found it more helpful than iostat to simply view the contents of /sys/block/$DEVICE/stat at very close intervals, simply like this:

while :; do cat /sys/block/sda/stat; sleep .1; done

See Documentation/iostats.txt in the kernel source tree for the format of the stat file. Viewing it at close intervals allowed me to see the exact timing and size of I/O bursts and such things.

In the end, I found out that the problem I had after the kernel upgrade was caused by stable pages, a feature introduced in Linux 3.0, causing, in my case, Berkeley DB to halt for extended periods when dirtying pages in its mmap'ed region files. While it seems possible to patch this feature out, and also that the problems it causes might be fixed in Linux 3.9, I have solved the worst problem I had for now by patching Berkeley DB to allow me to put its region files in a different directory (in my case /dev/shm), allowing me to avoid the problem altogether.

Dolda2000

Posted 2012-03-04T06:19:31.517

Reputation: 1 083

According to my experience the simplest and most detailed statistic tool you can install to trace mysterious system performance issues is http://freecode.com/projects/sysstat aka. sar

for sure you want to look at iostat command output as well, specially how much is your %iowait should be below 5-10% under normal system load (below 1.0 or so).

look at the ps output if in the STAT column you see D statuses that means those processes are locked and waiting for IO, very likely a hardware issue with the controller or the disk, check S.M.A.R.T stats as well as dmesg and syslog for clues

check sar log and identify peak times if ever this happens and try to match those time with disk intensive cron jobs eg backups over network

you can benchmark your disk performance with bonnie++

Feczo

Posted 2012-03-04T06:19:31.517

Reputation: 325

Thought I'd mention strace even though this question is now months old. It may help someone with a similar problem who finds this page.

try.

strace "application"

you can also do

strace -e read,write "application"

to just show read/write events.

The application will load as normal (albeit a bit slower to launch) and you can use it as normal in order to trigger the problem. The output will appear in the shell that you used to launch strace.

The good thing about strace is that you can see the most recent function/kernel call at the time the application triggers the slowdown. You may find that if your /home accounts are on NFS then the application is having some difficulty with file I/O over NFS for some reason.

user192032

Posted 2012-03-04T06:19:31.517

Reputation: