2

Am I misunderstanding something, or should this not be possible?

All my daemon processes are in zombie state after I tried to stop the control service:

# ps ax | grep controller
13768 pts/11   S+     0:00 grep controller
26866 ?        Zl    18:56 [controller] <defunct>
26870 ?        Zl    18:57 [controller] <defunct>
26871 ?        Zl    18:45 [controller] <defunct>
26876 ?        Zl    13:17 [controller] <defunct>
26877 ?        Zl    10:28 [controller] <defunct>
26880 ?        Zl    18:18 [controller] <defunct>
26881 ?        Zl    12:01 [controller] <defunct>
26882 ?        Zl    18:18 [controller] <defunct>

And yet ports are still open (although netstat can't find the process name)

# netstat -tlpn | sort
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      1180/sshd
tcp        0      0 0.0.0.0:80                  0.0.0.0:*                   LISTEN      11882/httpd
tcp        0      0 10.0.0.50:8890              0.0.0.0:*                   LISTEN      -
tcp        0      0 10.0.0.50:8891              0.0.0.0:*                   LISTEN      -
tcp        0      0 10.0.0.50:8892              0.0.0.0:*                   LISTEN      -
tcp        0      0 10.0.0.50:8896              0.0.0.0:*                   LISTEN      -
tcp        0      0 10.0.0.50:8897              0.0.0.0:*                   LISTEN      -
tcp        0      0 10.0.0.50:8900              0.0.0.0:*                   LISTEN      -

Although lsof can see the process names

# lsof -i -n -P | grep 10.0.0.50 | grep LISTEN
controlle 26866  devuser   82u  IPv4    323641      0t0  TCP 10.0.0.50:8890 (LISTEN)
controlle 26870  devuser   82u  IPv4    323629      0t0  TCP 10.0.0.50:8891 (LISTEN)
controlle 26871  devuser   82u  IPv4    323635      0t0  TCP 10.0.0.50:8892 (LISTEN)
controlle 26876  devuser   82u  IPv4    323643      0t0  TCP 10.0.0.50:8896 (LISTEN)
controlle 26877  devuser   82u  IPv4    323615      0t0  TCP 10.0.0.50:8897 (LISTEN)
controlle 26880  devuser   82u  IPv4    323647      0t0  TCP 10.0.0.50:8900 (LISTEN)
controlle 26881  devuser   82u  IPv4    323649      0t0  TCP 10.0.0.50:8901 (LISTEN)
controlle 26882  devuser   82u  IPv4    323631      0t0  TCP 10.0.0.50:8902 (LISTEN)

And weirdest of all, these zombie processes appear to be working fine:

# curl http://10.0.0.50:8892/status
{"status": "ok"}

But killing the processes to make them stop (I need to upgrade them, hence trying to stop them in the first place) doesn't have any effect.

I can probably reboot to kill the processes in order to upgrade them, but it would be nice to figure out WTF is happening here with invincible running-dead processes first...

Shish
  • 1,495
  • 9
  • 12
  • 1
    Does `kill -9` work? If not, you're stuck rebooting the box. It happens occasionally with misbehaving software. – Nathan C Apr 25 '14 at 13:44
  • `kill -9` does work to kill them -- I guess I hadn't tried that because I tried regular `kill` to no effect, and then remembered that zombie processes are by definition unkillable so `kill -9` couldn't possibly work either. And yet it does... Good to know for upgrading without rebooting, but it still doesn't explain the mystery >_ – Shish Apr 25 '14 at 13:55

2 Answers2

2

kill -9 will exterminate those zombies.

Typically, zombies happen when the parent dies and the child processes are not properly shut down by the parent before it exits. This happens more often if you kill the parent and it doesn't gracefully shut down (and take all the children with it). This is similar to an Orphan process.

Nathan C
  • 14,901
  • 4
  • 42
  • 62
  • kill -9 does kill them successfully -- but I'm mostly confused how a zombie process can still be listening on a port and answering requests, since my understanding of zombie processes is that they aren't really alive, they're just process table entries waiting for the parent to collect their exit codes – Shish Apr 25 '14 at 14:13
  • Well, this may be a case of an [Orphan](http://en.wikipedia.org/wiki/Orphan_process) process rather than a zombie, although they'll still be marked as zombies in the process table. They may work, but will fail if they rely on the parent for anything (depends on the app). – Nathan C Apr 25 '14 at 14:14
  • Do you have any documentation for understanding "orphans may be marked as zombies in the process table"? That would explain these symptoms, but it seems to contradict the definitions of "orphan" and "zombie" :S – Shish Apr 28 '14 at 09:39
  • This answer is incorrect. It is perfectly legitimate for a process with children to exit and leave the children running. And that does not cause those children to become zombies. In some cases it is actually the best way to prevent zombies. – kasperd May 01 '14 at 14:37
0

A process is a zombie in the time between the process exiting and the parent picking up the exit status. If a zombie stays around for a long time, it is indicating a flaw in the parent of that zombie. If the parent dies, the process is inherited by process number 1 (the init process). init should always be dealing with zombies very quickly. If you see zombies with parent pid 1, that would indicate something is wrong with init or the kernel.

kasperd
  • 29,894
  • 16
  • 72
  • 122
  • Yup, that is exactly what a zombie process is *supposed* to be. But these zombie processes were alive, had TCP ports allocated to them, and were responding with real data when the TCP ports were queried. – Shish May 01 '14 at 16:06
  • Ports are not allocated to processes. There are several data structures between a process and a TCP port. The output produced by netstat is just an approximation of the more complicated underlying data. The `l` in the `ps` output tells you that it is a multithreaded program. You have been assuming all the information you have been looking at has been about the same thread. But it can't be, because a zombie cannot have open file descriptors, and it won't get scheduled so there is no way it could respond to requests. Try looking at individual threads: `ps -m -A -o pid,lwp,ppid,stat,args` – kasperd May 01 '14 at 16:34
  • The zombies are kill -9'ed now, so no more debugging unless it happens again :( (and each time I killed one of the zombies, a port stopped being marked as in use, and stopped responding to requests). Are you implying that a zombie process can have multiple threads, some of which are still alive? – Shish May 04 '14 at 08:26