0

Good morning folks,

I'm often getting these monit alerts, about once or twice a day:

Connection failed Service amavisd

Date: Wed, 20 Jul 2022 09:04:58
Action: restart
Host: (hidden).com
Description: failed protocol test [SMTP] at [localhost]:10024 [TCP/IP] -- Error receiving data from the mailserver -- Resource temporarily unavailable

Your faithful employee,
Monit

and then shortly after about 20 seconds it's back running, with this email alert:

Connection succeeded Service amavisd

Date: Wed, 20 Jul 2022 09:07:03
Action: alert
Host: (hidden).com
Description: connection succeeded to [localhost]:10024 [TCP/IP]

Your faithful employee,
Monit

A bit too much noise for me for already getting many emails every day. Can this configuration be improved, not to alert me until a couple of retries or so? Or the other way around, investigate what error from the mail server has been returned?

Here is the current Monit configuration:

check process amavisd with pidfile /var/run/amavis/amavisd.pid
 group mail
 start program = "/etc/init.d/amavis start"
 stop  program = "/etc/init.d/amavis stop"
 if failed port 10024 protocol smtp then restart
 if 3 restarts within 3 cycles then alert
 if 6 restarts within 6 cycles then timeout
 depends on amavisd_bin
 depends on amavisd_rc

check file amavisd_bin with path /usr/sbin/amavisd-new
 group mail
 if failed checksum then unmonitor
 if failed permission 755 then unmonitor
 if failed uid root then unmonitor
 if failed gid root then unmonitor

check file amavisd_rc with path /etc/init.d/amavis
 group mail
 if failed checksum then unmonitor
 if failed permission 755 then unmonitor
 if failed uid root then unmonitor
 if failed gid root then unmonitor

Can you spot the issue?

Thanks, M.

1 Answers1

0

In Monit, in your rules, you can configure it to avoid false-positive or after several failure such as

if failed port 10024 for 3 times within 5 cycles then alert

More detail in the documentation at https://mmonit.com/monit/documentation/monit.html#FAULT-TOLERANCE

DevOps
  • 720
  • 3
  • 15
  • Thanks. Well, I already have that in my config, but it is still happening. I've updated my original post with my existing Monit config. Can you spot the issue? – Michael Heuberger Jul 21 '22 at 21:54
  • The issue is the way to restart, see "if failed port 10024 protocol smtp then restart" and replace it by something like "if failed port 10024 protocol smtp for 3 times within 5 cycles then restart". Sometimes it is useful to test the port only, without the "protocol smtp". – lutzmad Aug 03 '22 at 06:02