1

I have a case where empty PID files are being generated by certain processes which are monitored by monit. Monit is NOT so good in handling empty files and tries to re-start the process even when the process is already running and keep throwing errors in the monit log.

I am thinking of implementing a custom script to handle this when monit sees a PID file using which it failed to restart that process, run this custom script and re-populate the PID file with the PID of the already running process.

I am failing to write the "if failed" part to run this custom script.If it is some server process with port and protocol I can write one, but for just a background process I am NOT sure on how to handle this case

Intended Monit config but failing to compile when I run "monit -t"

Please help in suggesting the right config to handle monit restart failures.

Thank you.

# Check for cmaeventd process
check process cmaeventd with pidfile /var/run/cmaeventd.pid
group snmp-agents
start program = "/opt/hp/hp-snmp-agents/storage/etc/cmaeventd start"
stop program = "/opt/hp/hp-snmp-agents/storage/etc/cmaeventd stop"
if failed (restart|start) then exec "/tmp/pidchk.sh cmaeventd"
if 2 restarts within 3 cycles then timeout

Monit logfile:

[PST Feb  3 18:18:20] error    : monit: Error reading pid from file '/var/run/cmaidad.pid'
[PST Feb  3 18:18:21] error    : monit: Error reading pid from file '/var/run/cmaidad.pid'
[PST Feb  3 18:18:22] error    : 'cmaidad' failed to start

[PST Feb  3 18:19:22] error    : 'cmaidad' service restarted 2 times within 2 cycles(s) - unmonitor


Empty PID file:
logbash-3.1# ps -ef|grep cmaidad|grep -v grep
root     32298     1  0 18:14 ?        00:00:01 cmaidad -p 15 -s OK -l /var/log/hp-snmp-agents/cma.log
logbash-3.1# cat /var/run/cmaidad.pid

logbash-3.1# ls -l /var/run/cmaidad.pid
-rw-r--r-- 1 root root 1 Feb  3 18:14 /var/run/cmaidad.pid

Script that I wrote to populate the PID file, if that given process is running.

#!/bin/bash
# To re-populate the empty PID files which were NOT populated by the hp-snmp scripts
AGNTFILEPATH=/var/run

#different distros put pidof in different places
if [ -f /sbin/pidof ]; then
  PIDOF=/sbin/pidof
elif [ -f /bin/pidof ]; then
  PIDOF=/bin/pidof
fi

#add pid into agent file
addpidintofile() {
                PIDOFAGNT=`$PIDOF -o $$ -o $PPID -o %PPID -x $PNAME > /dev/stdout | cut -d " " -f1` 2> /dev/null
                if [ -f $AGNTFILEPATH/$PNAME.pid ]; then
                        echo "$PIDOFAGNT" > $AGNTFILEPATH/$PNAME.pid
                fi
}

PNAME=$1
cnt=`ps -ef|grep $PNAME|grep -v grep|wc -l`
if [ cnt == 0 ]
    then
    exit 1;
else 
    addpidintofile
    exit 0;
fi
gowin09
  • 21
  • 1
  • 3

1 Answers1

1

This is all a pretty bad approach to the problem you're trying to solve. You really want your HP monitoring agents/drivers to be stable and not crash...

Either way, if you aren't going to solve the root issue, you can just instruct Monit to use the process name instead of a PID.

check process cmaeventd
        matching "cmaeventd"
        start program = "/etc/init.d/cmaeventd start"
        stop program = "/etc/init.d/cmaeventd stop"
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Even if "matching" is used for a process, monit will only rely on the PID file to monitor the process and if it is empty, it retries for the defined number of restarts and then times out and unmonitor this process. So the only way is to find a way for monit to overcome this by executing the custom script, which I am having issues to implement. – gowin09 Feb 04 '15 at 23:28
  • K. Just a suggestion. I _really_ recommend that you fix your root problem and just [***update the HP Management Agent software***](http://serverfault.com/questions/664735/awaken-monit-daemon-every-few-hours-for-all-monitored-processes/664761#664761). If that is not within your control, please ask someone who has the requisite access. – ewwhite Feb 04 '15 at 23:38