I have a Apache 2.2.3 web server running on a 8 core VM with 8G Ram.
During a load test, the web server stopped responding and load average went up to 1000.
When I run Top command, I see a large number of httpd processes are stuck at "D" status. I did some search and it seems "D" status means uninterruptible sleep.
I straced one of the stuck processes and below is the output:
# strace -p 27843
Process 27843 attached - interrupt to quit
fcntl(34, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}
I then did a lsof to check what the fd 34 is and below is the output:
httpd 27843 apache 34u REG 8,1 0 131756 /tmp/.xcache.0.0.1292616489.lock (deleted)
It seems this might be related to a locking issue with xcache, but how should I continue troubleshoot from here?