Bash script entrypoint (PID=1) kills `tail` sub process ONLY if a fake trap (which does NOTHING) was there

Question

I am facing a strange behavior in my Bash script, I have this Bash script which is running with PID 1 (it is an entrypoint for Docker container, if you are not familiar with Docker, I assume you can ignore this info).

When I run the following script, SIGTERM terminates everything very quickly, and everything seems to be fine (please keep in mind that sshd service does not exist! My whole system starts only this script which runs tail nothing more, but till now it is not the problem).

#!/bin/bash

trap "pkill sshd" SIGTERM

export PATH=/usr/local/samba/bin/:/usr/local/samba/sbin/:$PATH

if [ -f /usr/local/samba/etc/smb.conf ]; then
        exec /usr/local/samba/sbin/samba -i
else
        tail -f /dev/null & wait ${!}
fi

The problem comes when I delete that trap. Now my system hangs, and it seems because tail is still running and does not end for some reason (if you are familiar with Docker, Docker waits for 10 seconds, and then kill the container, because it didn't respond for the SIGTERM, again if you are not familiar with Docker, ignore this info).

#!/bin/bash

export PATH=/usr/local/samba/bin/:/usr/local/samba/sbin/:$PATH

if [ -f /usr/local/samba/etc/smb.conf ]; then
        exec /usr/local/samba/sbin/samba -i
else
        tail -f /dev/null & wait ${!}
fi

Could someone explain to me what is the problem exactly? Why does that fake trap makes everything to work (although it does practically nothing, but it works because it is simply there).

I just still want to mention that using an empty trap: trap "" SIGTERM doesn't help, something should be there in the trap to work (even if it does nothing).

Hope that someone can help me, thanks!

@TarunLalwani I tried that, that also didn't work, it seems as tail doesn't handle the signals. I found now a solution for `Docker` case. I am gonna wait maybe some one can explain that before answering the question, because I found a work around, not a full understanding of the situation. I don't know what is the case if that was used an entrypoint for a real system system (i.e. to be started in a real system instead of init). — Mohammed Noureldin, Aug 20 '17 at 12:50

Andrii L. · Answer 1 · 2017-08-30T18:21:09.483

You haven't provided your Dockerfile and it's not clear how you send the SIGTERM signal to the container.

However, here is what I came up with in an attempt to reproduce your problem:

My Dockerfile:

FROM ubuntu
ADD ./entrypoint.sh /opt/entrypoint.sh
# Using the exec form here, so that the process is assigned PID 1.
ENTRYPOINT ["/opt/entrypoint.sh"]

Build container:

$ docker build -f Dockerfile -t test_image .

Don't forget to rebuild your container each time you change the entrypoint script.

Run container with this command:

$ docker run --rm -it --name test_trap test_image

Now, let's see what is happening on each run.

1) With the trap line in your Bash script:

# The main process will receive SIGTERM, trap it and exit.
$ docker stop test_trap

# The main process will receive SIGTERM, trap it and exit.
$ docker kill -s=TERM test_trap

# The main process will receive SIGKILL and will be stopped immediately.
$ docker kill -s=KILL test_trap

2) Without the trap line:

# The main process will receive SIGTERM which will be ignored.
# After a grace period (10s by default) it will receive SIGKILL and will be stopped.
$ docker stop test_trap

# The main process will receive SIGTERM which will be ignored.
# Container will continue running.
$ docker kill -s=TERM test_trap

# The main process will receive SIGKILL and will be stopped immediately.
$ docker kill -s=KILL test_trap

The reason is the kernel treats a process with PID 1 specially and doesn't kill the process receiving the SIGTERM signal (and also SIGINT).

More information on this issue:

Any process can register its own handlers for TERM and use them to perform cleanup before exiting. If a process hasn't registered a custom signal handler, the kernel will normally fall back to the default behavior for a TERM signal: killing the process.

For PID 1, though, the kernel won't fall back to any default behavior when forwarding TERM. If your process hasn't registered its own handlers (which most processes don't), TERM will have no effect on the process.

Source - https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html

UPDATE

I can't comment yet, so I will leave a comment here. This is same PID 1 problem. With both -d and -td the signals handling works as expected: TERM is ignored, as the entrypoint process is assigned PID 1, whereas KILL terminates the process. If you add the trap line, then the TERM signal will be trapped in both cases. If it's not working for you for any reasons, then you should post your Dockerfile, the exact commands you execute and update your question accordingly.

I already found the solution, actually this `t` parameter (to allocate tty), solves the problem, I don't know why, if you can clear that I will appreciate that. — Mohammed Noureldin, Aug 28 '17 at 23:09
While I really like the linked blog post above, I wanted to see it myself: Where in the kernel is this defined? It seems that the flag `SIGNAL_UNKILLABLE` is responsible. The flag is set in `kernel/fork.c` for init processes. It receives special handling in the signal delivery path in `kernel/signal.c`(see function `sig_task_ignored` and where it is called from). — falstaff, Jan 15 '20 at 12:44

score 0 · Accepted Answer · answered Aug 28 '17 at 23:10

0

Actually adding t parameter (to allocate tty) when running the container solves the problem. I was running it with -d parameter, and now with -td.

I don't know why, but it did it. If would be great if any one can explain why is that happening.

answered Aug 28 '17 at 23:10

Mohammed Noureldin

491
1
9
24

Bash script entrypoint (PID=1) kills `tail` sub process ONLY if a fake trap (which does NOTHING) was there

2 Answers2