I am seeing a problem with monit config in configuring the monit daemon to awaken every few hours and start monitoring the processes which were set to "Not Monitored" state.
PROBLEM: When the monit changes to unmonitor certain process, the status changes to "not monitored" and the monit daemon will NEVER try to start the monitoring of this process again even when the PID file is updated with new correct PID and the monitoring STOPS for this process forever unless the monit daemon is awakened for this process again manually like below.
Can this awakening daemon for each process be configured at certain timeout intervals in the monit config for this process, to avoid of pitfalls of ending up with process going to "not monitored" state forever?
Like if 2 restarts within 3 cycles then timeout {X hours} monitor restart
Thank you.
I have this below config for a snmp process.
# Check for cmaeventd process check process cmaeventd with pidfile /var/run/cmaeventd.pid group snmp-agents start program = "/opt/hp/hp-snmp-agents/storage/etc/cmaeventd start" stop program = "/opt/hp/hp-snmp-agents/storage/etc/cmaeventd stop" if 2 restarts within 3 cycles then timeout
For some reason, if the PID file is NOT populated correctly (I am working on fixing it), monit keeps trying to restart the process using the empty pid file throwing the below errors in the monit log and finally "unmonitor" it after it fails to restart within 3 cycles as we configured.
log messages:
[PST Feb 3 11:43:23] error : monit: Error reading pid from file '/var/run/cmaeventd.pid' [PST Feb 3 11:43:24] error : monit: Error reading pid from file '/var/run/cmaeventd.pid' [PST Feb 3 11:45:25] error : 'cmaeventd' service restarted 2 times within 2 cycles(s) - unmonitor
Monit status for that process after unmonitor:
Process 'cmaeventd' status not monitored monitoring status not monitored data collected Tue Feb 3 12:10:25 2015
Manually awakening the daemon for this process to start the monitoring again:
>monit monitor cmaeventd This will awaken the monit daemon for this process and starts reading the PID file again and if successful it starts the monitoring back in. Before awakening the monit daemon for this process: --------------------------------------------------- logbash-3.1# ls -l /var/run/cmaeventd.pid -rw-r--r-- 1 root root 1 Feb 3 00:00 /var/run/cmaeventd.pid logbash-3.1# cat /var/run/cmaeventd.pid logbash-3.1# ps -ef|grep cmaeventd |grep -v grep root 13066 1 0 00:00 ? 00:00:00 cmaeventd -p 15 -l /var/log/hp-snmp-agents/cma.log l logbash-3.1# echo "13066" > /var/run/cmaeventd.pid logbash-3.1# cat /var/run/cmaeventd.pid 13066 logbash-3.1# monit monitor cmaeventd
From log:
[PST Feb 3 12:20:54] info : monitor service 'cmaeventd' on user request [PST Feb 3 12:20:54] info : monit daemon at 23515 awakened [PST Feb 3 12:20:54] info : Awakened by User defined signal 1 [PST Feb 3 12:20:54] info : 'cmaeventd' monitor action done
Monit status:
Process 'cmaeventd' status initializing monitoring status initializing data collected Tue Feb 3 12:20:54 2015
Changes to below after sometime:
Process 'cmaeventd' status running monitoring status monitored pid 13066 parent pid 1 uptime 12h 21m children 0 memory kilobytes 2160 memory kilobytes total 2160 memory percent 0.0% memory percent total 0.0% cpu percent 0.0% cpu percent total 0.0% data collected Tue Feb 3 12:21:54 2015