53

I need to have network messages sent when a systemd service I have crashes or is hung (i.e., enters failed state; I monitor for hung by using WatchdogSec=). I noticed that newer systemd have FailureAction=, but then saw that this doesn't allow arbitrary commands, but just rebooting/shutdown.

Specifically, I need a way to have one network message sent when systemd detects the program has crashed, and another when it detects it has hung.

I'm hoping for a better answer than "parse the logs", and I need something that has a near-instant response time, so I don't think a polling approach is good; it should be something triggered by the event occurring.

Display Name
  • 751
  • 1
  • 8
  • 13

3 Answers3

50

systemd units support OnFailure that will activate a unit (or more) when the unit goes to failed. You can put something like

 OnFailure=notify-failed@%n

And then create the notify-failed@.service service where you can use the required specifier (you probably will want at least %i) to launch the script or command that will send notification.

You can see a practical example in http://northernlightlabs.se/systemd.status.mail.on.unit.failure

Davy Landman
  • 173
  • 1
  • 5
Pablo Martinez
  • 2,326
  • 16
  • 13
  • 5
    There are a couple corrections needed to the instructions on the linked site. First, `notify%n.service` is redundant, and will result in `notify@my-service.service.service`. Second, `%i` should be used instead of `%I`, or all dashes in the name will be converted to forward slashes. – orodbhen Jun 22 '16 at 15:42
  • 7
    Is there a way to do this for multiple or all units, without modifying their unit files? – Vladimir Panteleev Sep 10 '17 at 12:52
  • @VladimirPanteleev - you don't need to modify the actual unit files - you can just add an override for that specific feature. For example, run `systemctl edit my-service.service` and in the editor that opens add a line `[Unit]` followed by `OnFailure=notify-failed@%n`, save and exit. This will create an override file in `/etc/systemd/system/my-service.service.d/override.conf` with the added functionality (of course you can automate the creation of such files for multiple services, just don't forget to do `systemctl daemon-reload` if you modified files not through `systemctl`). – Guss Feb 06 '22 at 11:41
  • For anybody looking to do this for all service files at once, check **Example 3** at the very end of [systemd.unit](https://www.freedesktop.org/software/systemd/man/systemd.unit.html). You need to place a configuration under `service.d` directory and it will apply to all services. – Felipe May 19 '22 at 17:59
33

Just my way to notify :

/etc/systemd/system/notify-email@.service

[Unit]
Description=Sent email 

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c '/usr/bin/systemctl status %i | /usr/bin/mailx -Ssendwait -s "[SYSTEMD_%i] Fail" your_admin@company.blablabla'

[Install]
WantedBy=multi-user.target

add to systemd:

systemctl enable /etc/systemd/system/notify-email@.service

At others services add:

[Unit]
OnFailure=notify-email@%i.service

Reload the configuration:

systemctl daemon-reload
tjmcewan
  • 493
  • 3
  • 5
ceinmart
  • 497
  • 4
  • 11
  • Is there a way to avoid triggering it lots of times in a row? In some situations receiving 1K emails about a service that failed at night and tried over and over again to restart itself isn't helpful. – starbeamrainbowlabs Sep 20 '19 at 19:27
  • 1
    As far I know, no, there is no option from systemd. You should put some control into the bash command, something like touching a file and checking if it have +10min for example... in simple command logic: find -mmin +10 && send email && touch file ; – ceinmart Apr 07 '20 at 14:30
  • Why are you enabling the notification service? It's supposed to be started by other units, no reason to start it on boot. – drrlvn Mar 18 '22 at 08:30
0

I came across this utility which seems to provide this: https://github.com/joonty/systemd_mon