Icinga 1 host status UNREACHABLE but all checks are OK

Question

This is with a distributed Icinga 1 environment.

I have about 100 hosts on an Icinga 1 client/satellite that are stuck with UNREACHABLE status. All four checks for each host are returning OK state but the overall state of the device is UNREACHABLE.

The problem may have been caused by me leaving Icinga 1 running with the wrong permissions for /usr/lib64/nagios/plugins/check_icmp. (check_icmp did not have suid bit set.)

So I stopped Icinga and emptied the state retention file (state_retention_file=/var/spool/icinga/retention.dat) on the satellite and that didn't help. If I empty that same file on the master might it help?

ps shows my submit_check_result.sh submit_host_check.sh scripts running as zombies but they don't live very long.

score 0 · Answer 1 · answered Jun 22 '17 at 20:25

I had to restore my check forwarding scripts on the client.

Here are the broken bits.

# BEGIN submit_check_result.sh
##############################

return_code=-1

case "$3" in
    OK)
        return_code=0
        ;;
    WARNING)
        return_code=1
        ;;
    CRITICAL)
        return_code=2
        ;;
    CRITICAL)
        return_code=2
        ;;
esac
/usr/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" | /usr/sbin/send_nsca -H 111.14.219.31 -c /etc/nagios/send_nsca.cfg &
# END Check_result

##############################

BEGIN submit_host_result.sh

##############################

return_code=2

case "$3" in
    OK)
        return_code=0
        ;;
    WARNING)
        return_code=1
        ;;
    CRITICAL)
        return_code=2
        ;;
    UNKNOWN)
        return_code=2
        ;;
esac

END Check_host
##############################

score 0 · Accepted Answer · answered Jun 22 '17 at 20:30

And here is what seems to have fixed the problem.

cat /etc/icinga/scripts/submit_check_result.sh

return_code=-1

case "$3" in
    OK)
        return_code=0
        ;;
    WARNING)
        return_code=1
        ;;
    CRITICAL)
        return_code=2
        ;;
    UNKNOWN)
        return_code=-1
        ;;
esac

# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server
# submit to master Icinga den-mon-prod

/usr/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" | /usr/sbin/send_nsca -H 111.14.219.31 -c /etc/nagios/send_nsca.cfg &

cat /etc/icinga/scripts/submit_host_check.sh

return_code=-1

case "$2" in
    UP)
        return_code=0
        ;;
    DOWN)
        return_code=1
        ;;
    DOWN)
        return_code=2
        ;;
    UNREACHABLE)
        return_code=3
        ;;
esac

/usr/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" | /usr/sbin/send_nsca -H 111.14.219.31 -c /etc/nagios/send_nsca.cfg &

Icinga 1 host status UNREACHABLE but all checks are OK

2 Answers2