2

I got nagios critical warning about a server, and when i checked ps -aux i found that all of nginx (php-fpm) are in Uninterruptible sleep

www-data  1330  0.4  0.3 299992 108560 ?       D    16:06   0:16 php-fpm: pool www
www-data  1338  0.4  0.2 254728 92728 ?        D    16:06   0:16 php-fpm: pool www
www-data  1346  0.4  0.3 293544 100272 ?       D    16:06   0:17 php-fpm: pool www
www-data  1356  0.7  0.3 302504 101532 ?       D    16:06   0:29 php-fpm: pool www
www-data  1357  0.3  0.2 270672 85952 ?        D    16:06   0:13 php-fpm: pool www
....

and i was stuck with it and couldn't even restart nginx. and finally i restart the server to fix the issue!
although I have this in /etc/php5/fpm/php.ini

emergency_restart_threshold=10
emergency_restart_interval=1m
process_control_timeout=10s

which means that php5-fpm is supposed to restart in such cases, but it didn't!!
any idea of what might cause those processes to go in uninterruptible sleep status and how to avoid such case in future?
Thanks for your help

Alaa Alomari
  • 638
  • 5
  • 18
  • 37
  • 1
    This is commonly caused when waiting for IO from an unavailable device, like a broken NFS mount. What storage is mounted on the machine? – mgorven Jun 19 '12 at 06:45
  • it is LVM... i am not using NFS to serve web.... NFS on this server is for backup only – Alaa Alomari Jun 19 '12 at 07:07

1 Answers1

1

While D in top means uninterruptible sleep, I find it's easier to just think of D for Disk. The process is waiting on the kernel to get back to it with something, and 95% of the time this is reading from a disk.

The fact that it's uninterruptible sleep is why php-fpm can't restart itself.

So in this case you will want to check your disks, first with fsck -f /dev/mapper/VG-LV in single user mode, (if it's a remote dedicated server or VPS then you'll have to use a remote KVM console for this) then read the SMART data with smartctl -a /dev/sd? (if they're not in a hardware RAID array; if it's hardware RAID, use the vendor-provided tool) to see if one of your disks may be going bad.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940