3

I'm trying to watch the mailing list manager sympa with monit. A running sympa instance consists of multiple processes for the different tasks of list management (e.g. a separate process for archiving emails), but all processes are started/stopped with a single init script.

Optimally, monit should alert me if any of the services fails and then restart sympa, but restart it only once. A first solution looked like this:

check process sympa
  with pidfile /var/run/sympa/sympa.pid
  start program = "/etc/init.d/sympa start"
  stop program = "/etc/init.d/sympa stop"
check process sympa_bounced
  with pidfile /var/run/sympa/bounced.pid
  start program = "/etc/init.d/sympa start"
  stop program = "/etc/init.d/sympa stop"
check process sympa_bulk
  with pidfile /var/run/sympa/bulk.pid
  start program = "/etc/init.d/sympa start"
  stop program = "/etc/init.d/sympa stop"

However, if I stop sympa manually, the init script will be executed multiple times, once for every service I define (because every service failed).

My second approach was to define dependencies and only alert if any of the subservices fail:

check process sympa
  with pidfile /var/run/sympa/sympa.pid
  start program = "/etc/init.d/sympa start"
  stop program = "/etc/init.d/sympa stop"
  depends on sympa_bounced, sympa_bulk
check process sympa_bounced
  with pidfile /var/run/sympa/bounced.pid
  if does not exist then alert
check process sympa_bulk
  with pidfile /var/run/sympa/bulk.pid
  if does not exist then alert

But since the subservices are not restarted, the main service will also not be restarted. So I figured I could 'fake' a restart by setting start/stop to /bin/true:

check process sympa
  with pidfile /var/run/sympa/sympa.pid
  start program = "/etc/init.d/sympa start"
  stop program = "/etc/init.d/sympa stop"
  depends on sympa_bounced, sympa_bulk
check process sympa_bounced
  with pidfile /var/run/sympa/bounced.pid
  start program = "/bin/true"
  stop program = "/bin/true"
check process sympa_bulk
  with pidfile /var/run/sympa/bulk.pid
  start program = "/bin/true"
  stop program = "/bin/true"

This does not work either, because if sympa_bulk fails, the PID file will not created before the sympa service is restarted, and this does not happen before the sympa_bulk is running again.

Is there a way to monitor such a service, get alert messages for all subservices, but restart the service only once, even if all subservices fail at once?

morxa
  • 193
  • 1
  • 7

2 Answers2

4

There are two possible solutions I've found. Both are not optimal but work in my scenario:

  1. For every subservice, only check if the PID file exists and assume the service is online if the file exists. As before, the main service sympa depends on the subservices:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
      depends on sympa_bounced, sympa_bulk
    
    check file sympa_bounced
      with path /var/run/sympa/bounced.pid
      if does not exist then restart
    
    check file sympa_bulk
      with path /var/run/sympa/bulk.pid
      if does not exist then restart
    

    restart does nothing for files, but because sympa depends on the subservices, it will be restarted.

  2. With newer monit versions, you can also execute a command and pass arguments to the command:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
      depends on sympa_bounced, sympa_bulk
    
    check program sympa_bounced
      with path "/usr/bin/pgrep --pidfile /var/run/sympa/bounced.pid"
      if does not exist then restart
    
    check program sympa_bulk
      with path "/usr/bin/pgrep --pidfile /var/run/sympa/bulk.pid"
      if does not exist then restart
    

    Similar to check file, the restart action does not do anything for programs, but forces the service sympa to restart.

    With older monit versions (e.g. 5.4, the current version in Debian Wheezy), you cannot pass arguments to a command, so you could write a simple (one-line) script for each service which executes /usr/bin/pgrep with the respective arguments.

With both solutions, sympa is restarted once if any of the subservices fail or if sympa is not running at all.

morxa
  • 193
  • 1
  • 7
  • With `sympa`, this actually does not work, because the init script of `sympa` is somewhat broken (at least in Debian). If the main process sympa is killed but other processes (such as `sympa_bounced`) are still running, then `/etc/init.d/sympa start` does not start the main process. Therefore I would not recommend to use this with `sympa`. – morxa Feb 11 '15 at 09:37
0

You should be able to get around it using depends

So something like

check process sympa_bulk
  with pidfile /var/run/sympa/bulk.pid
  depends on sympa
  start program = "/bin/true"
  stop program = "/bin/true"
Mike
  • 21,910
  • 7
  • 55
  • 79