0

I have configured nagios server and added 30+ physical servers and 25+ vm's. All the configuration has completed and nagios server is monitoring servers,vm's and services.

But when I am rebooting a vm, Nagios server is not detecting the vm unavailability and it is showing online.

Anyone faced this issue? Can any one help on this ?

MOBIN TM
  • 3
  • 1
  • Show the configs, particularly what is the host check. – tater Oct 31 '20 at 06:08
  • 1
    In my experience VMs often reboot so fast that the reboot just falls between two checks. If you configure your monitoring only to trigger an alert after multiple consecutive fails the only way to report a reboot is to check the uptime. – Gerald Schneider Oct 31 '20 at 06:33
  • @tater `define host { use issdc-server host_name ansible alias Ansible Server address hostgroups Servers } define service { use generic-service host_name ansible service_description CPU Load check_command check_nrpe!check_load } define service { use generic-service host_name ansible service_description Memory Usage check_command check_nrpe!check_mem }` – MOBIN TM Oct 31 '20 at 09:12
  • @GeraldSchneider Exactly VM's are rebooting so fast. How reboot can be checked through uptime. I can chage the trigger alert after one fail also. Does that work in this case – MOBIN TM Oct 31 '20 at 09:17

1 Answers1

2

As some others have stated in their comments, Nagios is not detecting the servers are unavailable while they are rebooting because they are taking very little time to do it.

To check whether a server has been rebooted, you can write your own plugin. You just have to save the server's uptime in a temporary file and check the current uptime vs the old one. If the current uptime is lower than the saved one, then the plugin will return a critical status.

You can also use the check-uptime plugin (https://exchange.nagios.org/directory/Plugins/System-Metrics/Uptime/check-uptime/details) which would return a critical status when the uptime is less than, for example, 5 minutes. That way, you will receive a notification when the server's uptime goes below 5 minutes, which means it has been rebooted.

Use this script instead if you need to check for uptime in seconds:

#!/bin/bash
CRIT_VALUE=$1
if [[ "$CRIT_VALUE" == "" ]]
then
  # if any parameter is missing it will print it out and exit.
        echo "No argument supplied or argument missing."
        echo "Usage: ./uptime.sh <critical value in seconds>"
        echo "Example: ./uptime.sh 300"
        exit 1
else
  since=$(date -d "$(uptime -s)" +%s)
  now=$(date +%s)
  seconds_uptime=$(( now - since ))
  if [[ "$seconds_uptime" -le "$CRIT_VALUE" ]]; then
    echo "CRITICAL! System rebooted $(( seconds_uptime / 60 )) minutes ago."
    exit 2
  fi
  echo "OK. Up since $(date -d "$(uptime -s)")"
  exit  0
fi
Jesús Ángel
  • 518
  • 1
  • 6