I am beginning to migrate from Sysfence to Monit on a RHEL Linux box and I wonder if it possible to create the following sysfence equivalent in Monit.

Sysfence configuration (partial)

For example, our sysfence.conf file declares the following to check for load average conditions.

rule "high load" {
  la1 >= 5.0 and
     { la5 > 3.0 }
     { la15 > 2.0 }
run '/bin/high-load.sh'
step 300

Monit configuration (an attempt to simulate the Sysfence load average conditions)

For the monitrc file, I created the following statement which passed the configuration syntax check, but the alert only provides the trapped value of the 15min load average rather than for all conditions. One notable difference is that using "or" in place of the 2nd "and" produces a syntax error on monit startup, so AFAIK "or" logic is not permitted.

check system our.server.tld
  if loadavg (1min) > 1 and loadavg (5min) > 0.5 and loadavg (15min) > 0.25 then alert

For the test case, I am using much smaller trigger values to reach the thresholds more quickly on a test box with very little use at the time. When one of the conditions was met (15min loadavg) in this case, I received the following alert with no mention of the 1min and 5min load average even though the other conditions were not met, so it seems that the "and" conditions are ignored.

The actual load average values were: load average: 0.34, 0.47, 0.53. I am testing on a server with very little traffic and ran the find command to drive up the system load. Also, it appears that only one decimal places is allowed, so the 0.25 value for the 15min check was apparently rounded down.

Alert email that was sent from Monit

Resource limit matched Service our.server.tld

Date:        Thu, 01 Nov 2012 11:34:58
Action:      alert
Host:        our.server.tld
Description: loadavg(15min) of 0.5 matches resource limit [loadavg(15min)>0.2]

Your faithful employee,
  • 163
  • 5

1 Answers1


I think I see what you're trying to accomplish...

In plain english, you're trying to say,

"Send an alert if the 1-minute load average is greater than or equal to 5.0 AND either the 5-minute load average is greater than 3.0 OR the 15-minute load average is greater than 2.0"

That's not the approach to monitoring I'd like to see, since it can generate a lot of noise. Also, how are restored thresholds treated? What do you really want to prevent or be notified on? A high and persistent load, correct?

In Monit, I would use the "cycles" keyword to keep it under control.

Assuming a cycle variable of 60 seconds,

check system localhost
   # Send alert if 1-minute average is > 5 for 5 minutes
   if loadavg (1min) > 5 for 5 cycles then alert 
  • 194,921
  • 91
  • 434
  • 799
  • The sysfence approach did work well, but maybe its paradigm is not applicable to Monit. The most common condition we encounter is from a JVM gone wild and showing roughly 799% CPU utilization for one process on a dual quad Xeon box. The run queue is averaging around 30 procs as well. – T.P. Nov 01 '12 at 19:16
  • Also, once we kill off the offending process, the 1min load average drops way down, and since there are 5 minute sampling intervals we may get a residual false positive every so often if the 1min average is still over 5. – T.P. Nov 01 '12 at 19:26
  • 1
    Why not monitor those specific jvm processes and if 'jvm' (for example) goes over 25% CPU, alert & execute a script to kill/restart it? – bmurtagh Nov 01 '12 at 21:33
  • @bmurtagh - Actually, we're leaning in that direction for the near term. Since the application is still under active development, at this time we prefer an ops team member log in and make an on-the-spot assessment, maybe get a stack trace, and then take down the process gracefully when possible. – T.P. Nov 01 '12 at 22:11