22

Background

I've been asked to create a systemd script for a new service, foo_daemon, that sometimes gets into a "bad state", and won't die via SIGTERM (likely due to custom signal handler). This is problematic for developers, as they are instructed to start/stop/restart the service via:

  • systemctl start foo_daemon.service
  • systemctl stop foo_daemon.service
  • systemctl restart foo_daemon.service

Problem

Sometimes, due to foo_daemon getting into a bad state, we have to forcibly kill it via:

  • systemctl kill -s KILL foo_daemon.service

Question

How can I setup my systemd script for foo_daemon so that, whenever a user attempts to stop/restart the service, systemd will:

  • Attempt a graceful shutdown of foo_daemon via SIGTERM.
  • Give up to 2 seconds for shutdown/termination of foo_daemon to complete.
  • Attempt a forced shutdown of foo_daemon via SIGKILL if the process is still alive (so we don't have a risk of the PID being recycled and systemd issues SIGKILL against the wrong PID). The device we're testing spawns/forks numerous processes rapidly, so there is a rare but very real concern about PID recycling causing a problem.
  • If, in practise, I'm just being paranoid about PID recycling, I'm OK with the script just issuing SIGKILL against the process' PID without being concerned about killing a recycled PID.

Cloud
  • 405
  • 3
  • 12
  • 2
    Even if you spawn processes rapidly enough to roll over 4 million PIDs in two seconds, systemd **does not** sit in a loop checking "is this pid still alive? is this pid still alive?" because it doesn't _need_ to; it is already informed about whether its immediate child processes are still alive or not (by means of ordinary SIGCHLD and waitpid()). So if it sees that the process exited after SIGTERM, it will simply mark the service as 'inactive' at that point – it will not bother with checking, waiting, and sending the SIGKILL at all. – user1686 Aug 29 '18 at 06:33

2 Answers2

29

systemd already supports this out of the box, and it is enabled by default.

The only thing you might want to customize is the timeout, which you can do with TimeoutStopSec=. For example:

[Service]
TimeoutStopSec=2

Now, systemd will send a SIGTERM, wait two seconds for the service to exit, and if it doesn't, it will send a SIGKILL.

If your service is not systemd-aware, you may need to provide the path to its PID file with PIDFile=.

Finally, you mentioned that your daemon spawns many processes. In this case, you might wish to set KillMode=control-group and systemd will send signals to all of the processes in the cgroup.

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • Thank you. One last question: let's assume the service is not systemd-aware. What could I add to the systemd script for this service so that systemd creates/manages the PID file? Additionally, the service can be multi-instance via template units, so we typically launch it via `systemctl start foo_dameon@1.service", so would that impact the PID file logic in the script? – Cloud Aug 28 '18 at 18:41
  • 4
    @DevNull systemd does not create or manage PID files. There's no reason for it to do so. If your service doesn't create its own PID file, then if possible configure it to run in the foreground (instead of daemonizing) and set `Type=simple` in the systemd unit. – Michael Hampton Aug 28 '18 at 18:42
  • 1
    If the service has dependants, `Type=forking` has the advantage of (if the service was properly written) informing systemd when it's fully 'ready' which Type=simple cannot do. Daemonizing isn't a problem, even without a PID file – systemd will track down the main process anyway. – user1686 Aug 29 '18 at 06:29
  • 1
    @grawity True enough...though it's been my experience that services daemonize before they are actually ready to begin serving. A systemd-aware service using `Type=notify` is best for systemd, and many common services already do this. But probably not this legacy service. In the OP's case, he has a service which spawns many processes. The systemd docs [warn about this case](https://www.freedesktop.org/software/systemd/man/systemd.service.html#id-1.10.5). – Michael Hampton Aug 29 '18 at 13:54
1

Since nobody mentioned needing Type=oneshot, here's a complete example which exits because of a timeout failure.

[Unit]
Description=timeout test

[Service]
Type=oneshot
TimeoutStartSec=2
ExecStart=/bin/sleep 10
Evidlo
  • 21
  • 2