Now in due time, I have managed to solve this myself, so I can at least follow up on it myself for posterity.
Unfortunately, I lost the original problem in a kernel upgrade, but gained a new one instead, even worse in performance, and just as hard to track down. The techniques I found were the following:
First of all, blktrace
/blkparse
is a tool that I found quite helpful. It allows the tracing of the progress of individual I/O requests with many helpful details, such as the process that submitted the request. It is helpful to put the output on tmpfs
, so that the handling of the storage of the trace doesn't start tracing itself.
That helped only so far, though, so I compiled a kernel with more debugging functionality. In particular, I found ftrace
quite helpful, since it allowed me to trace the poorly performing process inside kernel space, to see what it did and where it blocked. Compiling a debug kernel also provides working WCHAN
output for ps
as well, which can work as an easier way to see what a process is doing inside the kernel, at least for simpler cases.
I was also hoping for LatencyTop to be useful, but I found it quite buggy, and also that it only displayed latency reasons that were too "high-level" to be truly useful, unfortunately.
Also, I found it more helpful than iostat
to simply view the contents of /sys/block/$DEVICE/stat
at very close intervals, simply like this:
while :; do cat /sys/block/sda/stat; sleep .1; done
See Documentation/iostats.txt
in the kernel source tree for the format of the stat
file. Viewing it at close intervals allowed me to see the exact timing and size of I/O bursts and such things.
In the end, I found out that the problem I had after the kernel upgrade was caused by stable pages, a feature introduced in Linux 3.0, causing, in my case, Berkeley DB to halt for extended periods when dirtying pages in its mmap'ed region files. While it seems possible to patch this feature out, and also that the problems it causes might be fixed in Linux 3.9, I have solved the worst problem I had for now by patching Berkeley DB to allow me to put its region files in a different directory (in my case /dev/shm
), allowing me to avoid the problem altogether.