Zombie process
On Unix and Unix-like computer operating systems, a zombie process or defunct process is a process that has completed execution (via the exit
system call) but still has an entry in the process table: it is a process in the "Terminated state". This occurs for the child processes, where the entry is still needed to allow the parent process to read its child's exit status: once the exit status is read via the wait
system call, the zombie's entry is removed from the process table and it is said to be "reaped". A child process always first becomes a zombie before being removed from the resource table. In most cases, under normal system operation zombies are immediately waited on by their parent and then reaped by the system – processes that stay zombies for a long time are generally an error and cause a resource leak, but the only resource they occupy is the process table entry – process ID.
The term zombie process derives from the common definition of zombie — an undead person. In the term's metaphor, the child process has "died" but has not yet been "reaped". Also, unlike normal processes, the kill
command has no effect on a zombie process.
Zombie processes should not be confused with orphan processes: an orphan process is a process that is still executing, but whose parent has died. When the parent dies, the orphaned child process is adopted by init
(process ID 1). When orphan processes die, they do not remain as zombie processes; instead, they are wait
ed on by init
. The result is that a process that is both a zombie and an orphan will be reaped automatically.
Overview
When a process ends via exit
, all of the memory and resources associated with it are deallocated so they can be used by other processes. However, the process's entry in the process table remains. The parent can read the child's exit status by executing the wait
system call, whereupon the zombie is removed. The wait
call may be executed in sequential code, but it is commonly executed in a handler for the SIGCHLD signal, which the parent receives whenever a child has died.
After the zombie is removed, its process identifier (PID) and entry in the process table can then be reused. However, if a parent fails to call wait
, the zombie will be left in the process table, causing a resource leak. In some situations this may be desirable – the parent process wishes to continue holding this resource – for example if the parent creates another child process it ensures that it will not be allocated the same PID. On modern UNIX-like systems (that comply with SUSv3 specification in this respect), the following special case applies: if the parent explicitly ignores SIGCHLD by setting its handler to SIG_IGN
(rather than simply ignoring the signal by default) or has the SA_NOCLDWAIT
flag set, all child exit status information will be discarded and no zombie processes will be left.[1]
Zombies can be identified in the output from the Unix ps
command by the presence of a "Z
" in the "STAT" column.[2] Zombies that exist for more than a short period of time typically indicate a bug in the parent program, or just an uncommon decision to not reap children (see example). If the parent program is no longer running, zombie processes typically indicate a bug in the operating system. As with other resource leaks, the presence of a few zombies is not worrisome in itself, but may indicate a problem that would grow serious under heavier loads. Since there is no memory allocated to zombie processes – the only system memory usage is for the process table entry itself – the primary concern with many zombies is not running out of memory, but rather running out of process table entries, concretely process ID numbers.
To remove zombies from a system, the SIGCHLD signal can be sent to the parent manually, using the kill
command. If the parent process still refuses to reap the zombie, and if it would be fine to terminate the parent process, the next step can be to remove the parent process. When a process loses its parent, init
becomes its new parent. init
periodically executes the wait
system call to reap any zombies with init
as parent.
Example
Synchronously waiting for the specific child processes in a (specific) order may leave zombies present longer than the above-mentioned "short period of time". It is not necessarily a program bug.
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
pid_t pids[10];
int i;
for (i = 9; i >= 0; --i) {
pids[i] = fork();
if (pids[i] == 0) {
printf("Child%d\n",i);
sleep(i+1);
_exit(0);
}
}
for (i = 9; i >= 0; --i){
printf("parent%d\n",i);
waitpid(pids[i], NULL, 0);
}
return 0;
}
Output
parent9 Child3 Child4 Child2 Child5 Child1 Child6 Child0 Child7 Child8 Child9 // there is a pause here parent8 parent7 parent6 parent5 parent4 parent3 parent2 parent1 parent0
Explanation
In the first loop, the original (parent) process forks 10 copies of itself. Each of these child processes (detected by the fact that fork() returned zero) prints a message, sleeps, and exits. All of the children are created at essentially the same time (since the parent is doing very little in the loop), so it's somewhat random when each of them gets scheduled for the first time - thus the scrambled order of their messages.
During the loop, an array of child process IDs is built. There is a copy of the pids[] array in all 11 processes, but only in the parent is it complete - the copy in each child will be missing the lower-numbered child PIDs, and have zero for its own PID. (Not that this really matters, as only the parent process actually uses this array.)
The second loop executes only in the parent process (because all of the children have exited before this point), and waits for each child to exit. It waits for the child that slept 10 seconds first; all the others have long since exited, so all of the messages (except the first) appear in quick succession. There is no possibility of random ordering here, since it's driven by a loop in a single process. Note that the first parent message actually appeared before any of the children messages - the parent was able to continue into the second loop before any of the child processes were able to start. This again is just the random behavior of the process scheduler - the "parent9" message could have appeared anywhere in the sequence prior to "parent8".
Child0 thru Child8 spend one or more seconds in this state, between the time they exited and the time the parent did a waitpid() on them. The parent was already waiting on Child9 before it exited, so that one process spent essentially no time as a zombie. [3]
See also
- Fork bomb
- Zombie object
- Uninterruptible sleep
References
- "wait(2) Man Page". Linux Programmer's Manual.
- "Zombies(5) - UNIX System V (Concepts)". The Collider Detector at Fermilab.
- https://stackoverflow.com/questions/42627411/can-someone-please-explain-how-this-worksfork-sleep
- "UNIX man pages : ps ()". UNIXhelp for Users. Archived from the original on 2013-03-08.