11

Monit sends an alert every time the monit daemon is stopped or started. This is obnoxious and not useful information.

According to the docs, I set:

set alert user@mycompany.com but not on { instance }

...which should send alerts to that e-mail, unless they are in the "instance" category, which is defined as starts/stops.

However, I still get alerts generated. This is super annoying. Clearly I must be missing something.

We're running Monit 5.2.4

Winfield
  • 266
  • 2
  • 8
  • Is this covered by [**this question?**](http://serverfault.com/questions/500700/suppress-monit-message-monit-instance-changed-on-stop-start/503671#503671) – ewwhite Aug 12 '13 at 21:24

4 Answers4

10

Monit can, according to the documentation generate a number of alerts:

Event:     | Failure state:              | Success state:
---------------------------------------------------------------------
action     | "Action done"               | "Action done"
checksum   | "Checksum failed"           | "Checksum succeeded"
bytein     | "Download bytes exceeded"   | "Download bytes ok"
byteout    | "Upload bytes exceeded"     | "Upload bytes ok"
connection | "Connection failed"         | "Connection succeeded"
content    | "Content failed",           | "Content succeeded"
data       | "Data access error"         | "Data access succeeded"
exec       | "Execution failed"          | "Execution succeeded"
fsflags    | "Filesystem flags failed"   | "Filesystem flags succeeded"
gid        | "GID failed"                | "GID succeeded"
icmp       | "Ping failed"               | "Ping succeeded"
instance   | "Monit instance changed"    | "Monit instance changed not"
invalid    | "Invalid type"              | "Type succeeded"
link       | "Link down"                 | "Link up"
nonexist   | "Does not exist"            | "Exists"
packetin   | "Download packets exceeded" | "Download packets ok"
packetout  | "Upload packets exceeded"   | "Upload packets ok"
permission | "Permission failed"         | "Permission succeeded"
pid        | "PID failed"                | "PID succeeded"
ppid       | "PPID failed"               | "PPID succeeded"
resource   | "Resource limit matched"    | "Resource limit succeeded"
saturation | "Saturation exceeded"       | "Saturation ok"
size       | "Size failed"               | "Size succeeded"
speed      | "Speed failed"              | "Speed ok"
status     | "Status failed"             | "Status succeeded"
timeout    | "Timeout"                   | "Timeout recovery"
timestamp  | "Timestamp failed"          | "Timestamp succeeded"
uid        | "UID failed"                | "UID succeeded"
uptime     | "Uptime failed"             | "Uptime succeeded"

We were able to fix this on our side by setting (addresses changed to protect the innocent):

SET ALERT important-messages@projectlocker.com ON { invalid, nonexist, timeout, resource, size, timestamp}
SET ALERT less-important-messages@projectlocker.com ON {action, permission, pid, ppid, instance, status}

This successfully routes the messages to the adresses we care about. You can set them globallly or locally, but our alerts are just global.

The subheadings under SERVICE TESTS at: http://mmonit.com/monit/documentation/monit.html correspond fairly neatly to the types above.

For each scheduled process or feature of your server, you should be able to come up with what matters to you in plain English, and match that desire to one of the tests mentioned in SERVICE TESTS. For example, if I'm running Apache, I know that I care about:

  • Is the PID in the PID file still running? (nonexist)
  • Did the PID change without my knowledge? (pid)
  • Is the service responding in a timely fashion to a restart? (timeout)

For a custom daemon that polls, I may care about whether the log file is getting updated with status messages regularly (timestamp).

Linus Oleander
  • 177
  • 1
  • 4
  • 13
brokenbeatnik
  • 251
  • 2
  • 6
  • 1
    How do you tell what you care about? I wasn't able to find good documentation on what those actions actually mean. For example, "uptime" seems pretty useful, but you don't have it on your list. – dfrankow Aug 07 '13 at 18:50
  • I'll edit my answer to comment. – brokenbeatnik Aug 12 '13 at 21:14
7

I'm using Monit version 5.2.5 and using the following has stopped monit alerts coming through

set alert example@gmail.com not {instance}

Thermionix
  • 907
  • 2
  • 15
  • 28
1

Simply tell it to knock it off after a certain number of retries in N time period according to these examples.

Ben Lutgens
  • 351
  • 1
  • 4
  • This is not the notification of a given watched process or service stopping or starting (which is news) but of the monit daemon itself stopping/starting, which is always intentional and not news. – Winfield Apr 20 '11 at 13:00
  • 1
    Ooooh, what if you remove the "set alert" line in global, and put explicit alerts in your service stanzas? – Ben Lutgens Apr 20 '11 at 13:12
  • this works better.. just set the alert emails in the checks.. remove the global one. – Mike Dec 13 '11 at 23:49
1

I was unable to fix this within monit and had to build a layer of processing on the monit e-mail to filter out these monit instance notifications before delivery, by intercepting them.

We're using pager duty to accumulate and dispatch notices from monit and several other systems, so in this case I added a filtering rule on the Monit service using a subject based regex to filter out monit instance notice e-mails.

Winfield
  • 266
  • 2
  • 8