5

enter image description here

I have several processes stuck on uninterruptible sleep statuses, all seemingly stemming from auplink /var/lib/docker/aufs/mnt. It's something docker related and it's waiting on an I/O that will never complete -- I get that, but how do I determine the exact cause? How can I know what I/O it is waiting on? Also, is there really no way of making these stuck processes go away without a hard reboot?

l46kok
  • 127
  • 2
  • 14
  • Can you strace the pid and put the output in the question? `strace -p ` Also, `lsof |grep ` will show open file handles... also, you could try gdb. – Tim Nov 20 '18 at 02:26
  • strace displays nothing after being attached. Similarly, gdb shows nothing. `lsof` gets stuck (you can see from above screenshot that it also goes into uninterruptible sleep status`. ) – l46kok Nov 20 '18 at 02:37
  • what about iostat, vmstat, smartctl? Maybe that part of hd is bad. It sounds crazy but it can be. And maybe docker process goes into infinite loop or something. – titus Nov 20 '18 at 02:43
  • iostat, vmstat reports nothing out of ordinary.. I doubt that it's the HD, it's a VM on the cloud (if it is, well then I'd need to go buy a lottery ticket!) – l46kok Nov 20 '18 at 02:54
  • Check syslog and kernel messages from around the time of the problem. Edit your question to add them as text wrapped in code tags - images are usually near unreadable. – John Mahowald Nov 20 '18 at 03:34
  • 1
    AUFS is not a reliable storage backend, which is part of the reason why the kernel developers refused to add it to Linux, and even Ubuntu had to drop it. Switch the Docker storage to overlay2. – Michael Hampton Nov 20 '18 at 03:36
  • can you provide the output of `grep aufs /proc/filesystems` – frontsidebus Nov 21 '18 at 15:58
  • I think your aufs isn't mounted properly. https://docs.docker.com/storage/storagedriver/aufs-driver/#prerequisites your system is behaving as if the filesystem is innaccessible, which is why your lsof command is hanging. Any of your existing procs that are trying to interact with that aufs at `/var/lib/docker/aufs/mnt` are going to be stuck in io wait until the filesystem is accessible. This is similar to yanking an nfs mount out from under a running nfs client that has procs dependent on data under the mountpoint. – frontsidebus Nov 21 '18 at 16:02

1 Answers1

7

You can see stack of the process:

cat /proc/<process pid>/stack

which will give you information on what it was doing when it ended up in D-state.

echo w > /proc/sysrq-trigger; dmesg

will tell kernel to report all stack traces for D-state processes in dmesg buffer.

Processes in D-state cannot be killed. There are situations where process stays in D-state for long time but occasionally finishes I/O and is interruptible for short period of time and then goes back to the same I/O activity and ends up in D-state again. Then with

while (true); do kill -9 PID; done

there is a little chance of delivering KILL signal while process is interruptible.

arekm
  • 131
  • 4