9

Monit seems to give up restarting a service if it fails a few times, and unmonitors it. I can’t find anything in the documentation about the specifics of when or why.

My Monit config would be setup as follows:

set daemon 10
set logfile /var/log/monit.log
set statefile /var/lib/monit/monit.state
set alert foo@example.com not { nonexist, action, instance }
include /etc/monit/conf.d/*

And this is an example of the Monit ruleset I am using:

check process myservice
  with pidfile /var/run/myservice/myservice.pid
  start program = "/home/myservice/current/start-myservice.sh"
    as uid myservice and gid myservice
  stop program = "/home/myservice/current/stop-myservice.sh"
    as uid myservice and gid myservice
  mode active

In my environment, I want it to keep trying on its poll intervals indefinitely. Is there any way to configure monit to never stop monitoring a service, even if it doesn’t start up successfully?

Giacomo1968
  • 3,522
  • 25
  • 38
Joe Shaw
  • 191
  • 1
  • 5
  • Please post a sample of your monit config. – ewwhite Sep 20 '11 at 17:47
  • https://gist.github.com/1229828 -- I removed some mail server/alert stuff and the HTTP server configuration from monitrc. The other file is an example of our service configuration. Note the lack of "if x restarts then timeout" clause in it. – Joe Shaw Sep 20 '11 at 18:10
  • I've wondered about this myself. Sometimes I just kill something to test what monit does and monit just unmonitors it. – Ramon Tayag Nov 07 '11 at 02:31

4 Answers4

5

I would simply use a cron job that would run monit start servicename at the desired intervals. Of course, you can use groups for a finer control.

alexandrul
  • 1,435
  • 2
  • 19
  • 25
2

I had the exact same issue where despite restarting monit, it would refuse to monitor after the timeout. Finally figured out had to delete the monit state file (/var/.monit.state) and restart monit to make it monitor all programs again.

Bart De Vos
  • 17,761
  • 6
  • 62
  • 81
sam
  • 21
  • 2
2

After doing some digging, it turns out Monit stores system monitoring data in a “state” file. And this “state” file keeps track of what services are being monitored/unmonitored.

So while this is a bit “brute force”-ish, it definitely works. If a service becomes “unmonitored” due to something like a timeout, then just remove the Monit state file from the system like this:

sudo rm /var/lib/monit/state

And then restart Monit like this and all should be good:

sudo service monit restart
Giacomo1968
  • 3,522
  • 25
  • 38
1

Based on your Monit code snippet, it looks like you have to modify or add cycle statements to your process stanza. See the relevant documentation here and here.

It seems like you may want to set your service tests to execute every cycle with no timeout statement. Also look at your monit homepage at http://hostname:2812. Check the page for the relevant service and look at the "Existence" field. Your default should look like:

If doesn't exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Giacomo1968
  • 3,522
  • 25
  • 38
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Looking at the monit page for the service, existence field is set to that. But the rule just doesn't seem to be obeyed. With a service that is designed to fail (ie, just a shell script that echoes out the date and does `exit 1`) it still unmonitors quickly. If I use "monit monitor joe-test" it stops monitoring after one failure. With "monit start joe-test" it gives up after 2 tries. – Joe Shaw Sep 23 '11 at 15:49
  • just take out the if 5 restarts within 5 cycles then timeout then monit system will not unmonitor it. –  Jun 11 '13 at 05:13