2

I'm using Monit to monitor various processes that need to be up and running as a group for a web site to work properly. To bring up or bring down the site, there's a definite order by which the processes must be started or stopped. The dependencies are as follows. (The names have been changed to protect the innocent. I use more descriptive names in the real configuration.)

  1. The service site depends on site.workerA, site.workerB and site-redis.

  2. Both workers depend on site-redis.

The site is always started or stopped through Monit so as to avoid the possibility of race conditions, or Monit working against me. (e.g. I stop a service and Monit keeps starting it back.)

The problem is that it takes much more time than necessary to bring the whole site up. If I instruct Monit to start the site, then once Monit has figured the dependencies, the sequence of actions on Monit's part is:

  1. Starts site-redis.
  2. Sleeps for 2 minutes.
  3. Detects that site-redis is running, so start the two workers.
  4. Sleeps for 2 minutes.
  5. Detects that the workers and redis are running, so start site.
  6. [Sleeps for 2 minutes]
  7. [Detects that site is running.]

I've bracketed the last 2 steps because they are practically moot since the site is effectively up and running before the last 2 minute interval.

The 2 minute sleep is the default polling interval that Monit uses to check on services. I know that I could reduce this interval so that these services are always polled more frequently. For instance, I could do

check process site.workerB pidfile "/srv/site/var/run/site/site.workerB.pid"
    every [number] cycles
    ...

I would also have to change the length of the polling cycle to something smaller so that a cycle is less than 2 minutes.

However, I don't want Monit to always poll these services more frequently. I'd like Monit to only poll services more frequently when it is in the midst of waiting for a state change. Say, if Monit has started a service and another service depends on it, poll at a 5 second interval rather than 2 minutes.

I'm not seeing any way to configure Monit to do this, but maybe I missed something.


Here is an illustration of my prose description above. After removing things that are not pertinent to the issue, the Monit configuration is like this:

check process site-redis pidfile ".../site/redis.pid"
      group site
      start program = ...
      stop program = ...
      if does not exist then start

check process site pidfile ".../site/site.pid"
      group site
      depends on site.workerA, site.workerB, site-redis
      start program = ...
      stop program = ...
      if does not exist then start

check process site.workerA pidfile ".../site/site.workerA.pid"
      group site
      depends on site-redis
      start program = ...
      stop program = ...
      if does not exist then start

check process site.workerB pidfile ".../site/site.workerB.pid"
      group site
      depends on site-redis
      start program = ...
      stop program = ...
      if does not exist then start
Louis
  • 506
  • 3
  • 12
  • 2
    As far as I know, this is not possible in Monit (yet) best is to ask on user group mailing list (monit-general@nongnu.org) or request a new feature on the Bitbucket tracker https://bitbucket.org/tildeslash/monit/ – DevOps Feb 07 '18 at 07:04

1 Answers1

0

You can actually trigger a re-evaluation of monit by monit validate or SIGUSR1.

So you can rewrite your start/stop programs to:

#!/bin/bash

function background {
    i="0"
    while [ $i -lt 20 ]; do
        monit validate > /dev/null
        monit status __YOUR__SERVICE__NAME__HERE__ | grep OK > /dev/null && exit 0
        sleep 5
        i=$[$i+1]
    done
}


# restart procedure


state=$?

background &

exit $state

Long story short: After the restart procedure is completed its exit code gets tracked and a background loop is started. It runs for 20 times and sleeps for 5 seconds after each run (~ 1m40s). It forces monit to re-evaluate its state and breaks out of the loop if the state is OK. If not, the loop continues. Finally the script itself exits with the restart procedures exit code.

boppy
  • 476
  • 2
  • 5