1

We have the following monit config which restarts tomcat if unable to connect to it:

check host Tomcat-Foo with address localhost
 stop program = "/usr/bin/systemctl stop tomcat.service"
 start program = "/usr/bin/systemctl start tomcat.service" with timeout 360 seconds
 if failed host localhost
        port 8081
        protocol http
        request "/foo/"
        for 3 times within 5 cycles
 then alert

The problem is that it takes quite a while to start up, and monit seems to keep checking. This means that while tomcat is starting up, monit seems to think it is down "again" and initiate another restart, turning it into a restart loop.

Is there an easy way to have monit pause/disable the checking until tomcat is in fact back up again?

Alternatively if this config should look completely different so it wasn't an issue to begin with?

Svish
  • 6,627
  • 14
  • 37
  • 45

3 Answers3

0

Just add timeout on restart, and a sleep in a start script. For some reason, "&& sleep 5m" in the start command does not work.. it would be nice figure out a way to delay the start command.

Also note that if you have Apache in front of Tomcat, the host check will always succeed!.. so http-check.sh below works by checking for a keyword.

/etc/monit/bin/tomcatstart.sh

#!/bin/bash
/usr/sbin/service tomcat8 start
sleep 5m 

/etc/monit/conf-enabled/tomcat8

check program http-check with path "/etc/monit/bin/http-check.sh"
   group tomcat8
   start program = "/etc/monit/bin/tomcatstart.sh" with timeout 450 seconds
   stop program  = "/usr/sbin/service tomcat8 stop"
  if status != 0 for 2 times within 2 cycles
  then restart

/etc/monit/bin/http-check.sh

#!/bin/bash

RESULT="`wget -qO- https://www.host.com`"

if [[ $RESULT == *"Contact"* ]]
then
  exit 0
else
        exit 1
fi

works as expected, waits 5 minutes without trying again.

[EDT May 30 13:27:56] error    : 'http-check' '/etc/monit/bin/http-check.sh' failed with exit status (1) -- no output
[EDT May 30 13:27:56] info     : 'http-check' trying to restart
[EDT May 30 13:27:56] info     : 'http-check' stop: /usr/sbin/service
[EDT May 30 13:27:56] info     : 'http-check' start: /etc/monit/tomcatstart.sh
[EDT May 30 13:34:01] error    : 'http-check' '/etc/monit/bin/http-check.sh' failed with exit status (1) -- no output
[EDT May 30 13:34:01] info     : 'http-check' trying to restart
[EDT May 30 13:34:01] info     : 'http-check' stop: /usr/sbin/service
[EDT May 30 13:34:02] info     : 'http-check' start: /etc/monit/tomcatstart.sh
otterslide
  • 101
  • 1
0

Try this:

check host Tomcat-Foo with address localhost every 2 cycles
...

When monit performs its checks, it will only check Tomcat-Foo every 2 cycles, giving it more time to start up. Adjust the number of cycles if you require more/less time.

  • Tried it, but then it always checks every 2 cycles. I just want the checking to pause while waiting for it to restart. – Svish Feb 18 '16 at 23:16
  • The only other thing I can think of would be to set a timeout, which you already have. Is 360 seconds not long enough maybe? Check out the service poll time section of the monit documentation here: https://mmonit.com/monit/documentation/monit.html#SERVICE-POLL-TIME. Sorry I'm can't be more helpful. – Dominic Feb 19 '16 at 13:54
  • The timeout doesn't help anything because the systemctl call actually returns immediately. So later found out the timeout in the script in the question is actually useless. – Svish Feb 19 '16 at 18:15
  • Oh interesting. Perhaps you could wrap that systemctl call in a script that doesn't return until Tomcat-Foo is completely done initializing. – Dominic Feb 19 '16 at 19:04
0

This is the "slightly" hackish solution we have currently. Basically, if Tomcat doesn't start up within the cycle and is therefore restarted again (and again, and again...), the if N restarts check runs a script that turns off monitoring for a while.

We also changed the monit configuration to target the tomcat process as well, so it's not just a host check.

Monit config

check process Tomcat with pidfile /opt/tomcat/current/bin/catalina.pid
  stop program = "/usr/bin/systemctl stop tomcat.service"
  start program = "/usr/bin/systemctl start tomcat.service"

  if failed host localhost port 8081
      protocol http request "/productconfigurator/"
      for 3 times within 5 cycles
      then restart

  if 2 restarts within 3 cycles
      then exec "/etc/monit-wait.sh tomcat 5m"

monit-wait.sh

#!/bin/bash

monit unmonitor $1
sleep $2
monit monitor $1

Not particularly pretty, but seems to work at least. The alternative could of course be to use this script as the failed action, but yeah... anyways, better suggestions are welcome still :)

Svish
  • 6,627
  • 14
  • 37
  • 45