3

I have some zombie processes on my system. I've killed the parent of those zombies hoping init will take over and free up the resources (lots of sockets in CLOSE_WAIT). However init is not removing those proceses from the system:

#ps ax
...
17051 ?        Zl   8498:24 [impalad] <defunct>
...

# ps -o ppid= -p 17051
    1

Is there a way to remove the zombies without rebooting?

UPDATE:

I've tried kill -s SIGCHLD 1. It didn't help.

facha
  • 1,298
  • 2
  • 16
  • 26

1 Answers1

6

You cannot kill a defunct process. In someone else's words:

http://www.linuxquestions.org/questions/suse-opensuse-60/howto-kill-defunct-processes-574612/

You cannot kill a defunct process (a.k.a zombie) as it is already dead. It doesn't take any resources so it's no big deal but if you really want it to disappear form the process table you need to have its parent procees reaping it. "pstree" should give you the process hierarchy and "kill -1 " is sometimes enough for the job.

Because your process's parent pid is init (1), you can't do anything except reboot.

https://unix.stackexchange.com/questions/11172/how-can-i-kill-a-defunct-process-whose-parent-is-init

You cannot kill a (zombie) process as it is already dead. The only reason why the system keeps zombie processes is to keep the exit status for the parent to collect. If the parent does not collect the exit status then the zombie processes will stay around forever. The only way to get rid of those zombie processes are by killing the parent. If the parent is init then you can only reboot.

I can't test this, but this guy says you can get rid of a defunct process like so:

What is a zombie process and how do I kill it?

There is already an accepted answer, however: you CAN kill the zombie process. Attach with the debugger to the parent process and call waitpid function. E.g.: - let's assume that the parent has PID=100, the zombie process has PID=200

$ gdb -p 100
(gdb) call waitpid(200, 0, 0)
(gdb) quit

This guy had a problem with a defunct process that seemed to continue running. I don't understand, but here's the link. In this case kill -9 pid is claimed to work.

Zombie processes still alive and working fine, but can't be killed?

Ryan Babchishin
  • 6,160
  • 2
  • 16
  • 36
  • I'm not trying to kill a zombie. I'm trying to free up the resources it occupies. – facha Aug 23 '16 at 07:39
  • It occupies no resources. It's not even a process. Add you did ask how to "remove a zombie". – Ryan Babchishin Aug 23 '16 at 07:39
  • @facha see my answer, at the bottom, about gdb – Ryan Babchishin Aug 23 '16 at 07:42
  • lsof shows it has thousands of open files (most of them are sockets in CLOSE_WAIT). I've reached the limit of open files for that particular user and cannot launch any other processes as that user. – facha Aug 23 '16 at 07:42
  • @facha What's that for? – Ryan Babchishin Aug 23 '16 at 07:43
  • Would it be safe to attach to init (pid 1) with gdb? – facha Aug 23 '16 at 07:47
  • @facha I wouldn't do it on a serious production server... I have no idea what will happen. A reboot would be more predictable. As far as defunct processes go, anything you lookup about them will say they cannot hold file descriptors open so I don't know what's going on with your server. – Ryan Babchishin Aug 23 '16 at 07:51
  • @facha This guy was having a similar problem. I didn't read it all. http://serverfault.com/questions/591333/zombie-processes-still-alive-and-working-fine-but-cant-be-killed Did you every try `kill -9 pid`? – Ryan Babchishin Aug 23 '16 at 07:54