Nagios check notification intervals must be >= to a check interval because this prevents Nagios from sending out false alarm notifications should a service return to an UP status between checks. I understand the reasoning behind that.
We have a number of checks that run every 30 minutes. This means that if a check fails only one notification is sent out each time the service is checked after the retries are used up.
What I need is to be able to keep pestering the duty admin pager every two minutes after a check has gone HARD DOWN/CRITICAL. I can't do this because the next notification will only go out on the next check i.e. in another 30 minutes.
A feature we had on our old monitoring system was to set a new lower check interval as soon as the check had gone HARD DOWN/CRITICAL. This meant we could keep rechecking every two minutes (and sending alerts) until the alert was acknowledged by a human or changed its status to UP, after which the check interval would revert to 30 minutes.
Is there a way to facilitate this on Nagios?
I've had some thoughts about writing an event handler which will reschedule a check for two minutes in the future after a check has gone HARD DOWN/CRITICAL (by directly sending a command to Nagios).
I'm wondering if anyone else has had to do a similar thing?
I'm running Nagios Core 3.2.3.