3

In my monitoring box, I have lots of zombie process created by nagios and they gets remove quickly also. I am using active checks to perform monitoring of my servers. I accumulated the defunct processes created using the following command:

$ top -d 0.25 -b -n 20 > topout.txt

This collected the output of top with 0.25s delay 20 times.

I did grep on the topout.txt for the defunct process.

$ cat topout.txt | grep defunct

I get the following output.

 8957 nagios    20   0     0    0    0 Z  6.0  0.0   0:00.02 nagios <defunct>                                                                         
 8951 nagios    20   0     0    0    0 Z  3.0  0.0   0:00.01 nagios <defunct>                                                                         
 8954 nagios    20   0     0    0    0 Z  3.0  0.0   0:00.01 nagios <defunct>                                                                         
 8945 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 8946 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 8980 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9000 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.00 nagios <defunct>                                                                         
 9024 nagios    20   0     0    0    0 Z  7.0  0.0   0:00.02 nagios <defunct>                                                                         
 9025 nagios    20   0     0    0    0 Z  3.5  0.0   0:00.01 nagios <defunct>                                                                         
 9040 nagios    20   0     0    0    0 Z  3.1  0.0   0:00.01 nagios <defunct>                                                                         
 9086 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9087 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9123 nagios    20   0     0    0    0 Z  6.1  0.0   0:00.02 nagios <defunct>                                                                         
 9126 nagios    20   0     0    0    0 Z  3.0  0.0   0:00.01 nagios <defunct>                                                                         
 9131 nagios    20   0     0    0    0 Z  3.0  0.0   0:00.01 nagios <defunct>                                                                         
 9091 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.05 nagios <defunct>                                                                         
 9111 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9119 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9118 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9151 nagios    20   0     0    0    0 Z  2.9  0.0   0:00.02 nagios <defunct>                                                                         
 9153 nagios    20   0     0    0    0 Z  2.9  0.0   0:00.02 nagios <defunct>                                                                         
 9150 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9164 nagios    20   0     0    0    0 Z  3.5  0.0   0:00.02 nagios <defunct>                                                                         
 9171 nagios    20   0     0    0    0 Z  3.5  0.0   0:00.02 nagios <defunct>                                                                         
 9154 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9156 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9163 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9167 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9178 nagios    20   0     0    0    0 Z  3.8  0.0   0:00.02 nagios <defunct>                                                                         
 9174 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9179 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>                                                                         
 9182 nagios    20   0     0    0    0 Z  0.0  0.0   0:00.01 nagios <defunct>    

Can somebody help me in finding out the reason of these zombie processes and how i can prevent these zombie processes ?

pradeepchhetri
  • 2,518
  • 6
  • 33
  • 45

1 Answers1

3

Nagios hasn't run the signal handler yet for SIGCHLD. This could be because it's waiting in the run queue or busy handling another signal. As long as they go away quickly it's not a cause for concern.

bhawkfan
  • 51
  • 2
  • Actually all the checks are active, and most of the checks are check_by_ssh. can this be the reason ? – pradeepchhetri Sep 07 '12 at 13:46
  • Active checks involve spawning a child process and would leave behind zombie processes until they're reaped by the parent process (nagios). Usually this happens very quickly but if nagios is blocking signals, say while in a handler for another signal, they'll stick around for a short while. It doesn't really matter what check command you're using. – bhawkfan Sep 07 '12 at 16:07