0

On my Fedora Core 9 webserver with kernel 2.6.18.8, init isn't reaping zombie processes. This would be bearable if it wasn't for the process table eventually reaching an upper limit where no new processes can be allocated.

Sample output of ps -el | grep 'Z':

F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
5 Z     0  2648     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
1 Z    51  2656     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
1 Z     0  2670     1  0  75   0 -     0 exit   ?        00:00:02 crond <defunct>
4 Z     0  2874     1  0  82   0 -     0 exit   ?        00:00:00 mysqld_safe <defunct>
5 Z     0 28104     1  0  76   0 -     0 exit   ?        00:00:00 httpd <defunct>
5 Z     0 28716     1  0  76   0 -     0 exit   ?        00:00:06 lfd <defunct>
5 Z    74 10172     1  0  75   0 -     0 exit   ?        00:00:00 sshd <defunct>
5 Z     0 11199     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11202     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11205     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11208     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11211     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11240     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11246     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11249     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11252     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
1 Z     0 14106     1  0  80   0 -     0 exit   ?        00:00:00 anacron <defunct>
5 Z     0 14631     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>

Is this an OS bug? misconfiguration? I'm looking for inspiration as to the source of this problem. Thanks

  • 2.6.18.8 is way to out-dated, for e. g., RedHat uses at least 2.6.18-194, so you'd better update your system. – poige Feb 01 '11 at 05:42

3 Answers3

1

A process will become a zombie if the parent dies. If so, init will become the new parent of the orphan process.

Init will, periodically , execute wait() and will so reap any processes with init as parent. This happens synchronously, meaning, it waits to reap each process individually. This may make the process longer at times, if a processes doesn't reap properly.

It may suggest a bug in the system, it may not. I suggest kernel mailing list or dist specific mailing list that deals with the kernel.

artifex
  • 1,634
  • 1
  • 17
  • 22
1

When a process calls the "exit" system call, it may not be completely done. For example, it may have pending IO operations in progress (such as a large write that is still partially buffered in the kernel). When this happens, the kernel has to finish any pending operations before it can finish the exit system call.

However, once a process calls "exit", it is no longer live, and the kernel will have reclaimed as many resources as possible from it. So, it will be reported as a zombie, even though the kernel is not quite ready for it to be reaped, yet.

Normally, the kernel can clean up after the process in fractions of a second, the exit system call finishes, the parent gets notified, and the process gets reaped. This can cause issues, however, if the process gets hung up on an IO process that can never complete (such as can happen all too easily with various combinations of nfs and time-limited authentication, for instance). When that happens, your only alternative is to reboot, I'm afraid.

Jim
  • 11
  • 2
  • I am not entirely convinced this answer is accurate. Could you cite some sources which confirm that it is possible for a process to be in zombie state but it is not possible to wait for it? – kasperd Apr 28 '17 at 23:14
-1

Zombies cannot be reaped by init. This is responsibility of the parent process to reap them by calling wait*(). These processes are left so that parent can take the return values.