8

I ran into this problem while writing unit-file for one simple daemon. When daemon returns '1' on startup systemd just ignores it, and it looks like daemon was started successfully while it's actually dead.

For example, I have very simple shell script:

#!/bin/bash
exit 1

So unit-file looks like this:

[Unit]
Description=test service
After=syslog.target

[Service]
User=testuser
Group=testuser
ExecStart=/usr/local/bin/return1

[Install]
WantedBy=multi-user.target

Trying to start, seems ok:

# service testservice start
# echo $?
0

But actually it is dead:

# service testservice status
● testservice.service - test service
   Loaded: loaded (/etc/systemd/system/testservice.service; enabled)
   Active: failed (Result: exit-code) since Fri 2016-01-22 14:51:45 MSK; 1min 13s ago
  Process: 16416 ExecStart=/usr/local/bin/return1 (code=exited, status=1/FAILURE)
 Main PID: 16416 (code=exited, status=1/FAILURE)

Jan 22 14:51:45 servername systemd[1]: Started test service.
Jan 22 14:51:45 servername systemd[1]: testservice.service: main process exited, code=exited, status=1/FAILURE
Jan 22 14:51:45 servername systemd[1]: Unit testservice.service entered failed state.

It looks like systemd thinks that daemon was started successfully, but crashed later.

I tried to resolve this problem by changing service Type to 'forking' and others - this works fine in a case of non-zero code, but service is actually 'simple', so in a case if successful start it just stays and keeps the terminal busy.

How do I manage this kind of services? Or may be it is necessary to fix something in daemon code?

OS debian 8 x64, systemd 215

Paul K.
  • 125
  • 1
  • 1
  • 9
  • What did you expect to happen? It successfully created the process defined to start the service, so it started it successfully. Later (not much, but still, later) the process for the service exited with an error code and it was logged as such. – Eric Renouf Jan 22 '16 at 12:55
  • Right, and when it exits it enters the failed state, but that doesn't mean that systemd was unable to start the process, your note at the start is wrong. Your script does not return 1 at startup, it returns 1 after running for "a while". If you make that script non executable then systemd would be unable to start it, but as it is it does start, and then later it exits "unexpected" and systemd reflects that too – Eric Renouf Jan 22 '16 at 13:07
  • Got it. So, is there a native way to catch a crash during short time after start, some acceptable timeout? Or I should use something like ExecStartPost to watch the main process? – Paul K. Jan 22 '16 at 13:15
  • Not that I'm aware of, but someone more knowledgeable may be able to help. It feels like it's not the right path to try to overload the start state to include failures after starting, perhaps you should just check the status immediately after the start? `systemctl status testservice` would give you a usable exit code to tell whether the service is "still" running or not – Eric Renouf Jan 22 '16 at 13:22
  • You should check the service's own logs to find out why it is stopping. – Michael Hampton Jan 22 '16 at 18:34

2 Answers2

6

For systemd to detect if the process was started successfully you have to use Type=forking, then fork your process in a helper script, and check in that script if the process was started successfully. With forking systemd will wait for the ExecStart command to finish and it will check its exit status.

You should change your unit file like this:

[Unit]
Description=test service
After=syslog.target

[Service]
Type=forking
User=testuser
Group=testuser
ExecStart=/usr/local/bin/fork_service

[Install]
WantedBy=multi-user.target

and in /usr/local/bin/fork_service you should have something like this:

#!/bin/bash

# Run your process in background
/path/to/your_service &

# Check if the services started successfully 
if ! kill -0 $! 2>/dev/null; then
    # Return 1 so that systemd knows the service failed to start
    exit 1
fi

I'm here just checking if the background process PID is still active, but you can have any check you want. The only important thing is that this script exits with 0 if the process started successfully, or a positive non zero value if it failed.

Also you don't have to use Bash to fork a process, you can use any language you want.

2

So what you want is for systemd to wait until the process is initalized before returning. Which is a reasonable feature. However since process initialization can take an arbitrary amount of time, there is no sane way for systemd to know how long to wait without help from the process being started.

The traditional way to do this is to use a forking daemon. The daemon does the initialization in one process, and then forks off the actual process once it has been establish that it can initalize successfully. The fork happening and the original process exiting successfully is then the signal to systemd that the daemon is initialized. From the systemd documentation:

If set to forking, it is expected that the process configured with ExecStart= will call fork() as part of its start-up. The parent process is expected to exit when start-up is complete and all communication channels are set up. The child continues to run as the main service process, and the service manager will consider the unit started when the parent process exits. This is the behavior of traditional UNIX services. If this setting is used, it is recommended to also use the PIDFile= option, so that systemd can reliably identify the main process of the service. systemd will proceed with starting follow-up units as soon as the parent process exits.

Systemd offers another solutions which seems more elegant to me. There is no reason to fork, just make the daemon tell systemd directly when it is initialized; this is the Type=notify. (see the documentation I linked above)

So to fix your specific example, you would modify your service file to say Type=notify and /usr/local/bin/return1 to be

#!/bin/bash                                                                     
exit 1
systemd-notify READY=1

Where obviously the exit 1 would normally happen conditionally if the initialization fails. And where the READY=1 is sent once initalization is finished.

This will give you the expected error message on the command line when you try to start it. To see that systemd actually waits for the READY=1, you can try it with the following /usr/local/bin/return1:

#!/bin/bash                                                                     
sleep 3
systemd-notify READY=1
sleep 1000000