There is an icinga server set up to run check_megaraid_sas. It has been working beautifully for the last ~7 months.
During this time several "Unable to read output" messages have shown up for a variety of reasons, and in all of those cases the status was UNKNOWN, which triggered our alert system.
Recently the drives went from warning, to critical, to OK:
- 15:22:03 RAID-Health;WARNING; ...
- 18:42:03 RAID-Health;CRITICAL; ...
- 19:04:03 RAID-Health;OK;NRPE: Unable to read output
At the time of this issue, megaraidsas-status returned the following:
-- Arrays informations --
-- ID | Type | Size | Status
-- Disks informations
-- ID | Model | Status | Warnings
However, I would have expected the script to return "OK: Drives 0" (as some of the users comments suggest--still an error, but a different approach to fix). Since NRPE returned "Unable to read output" and marked it as "OK", it makes me think this is a problem with NRPE instead of the script.
Is there some way to convince NRPE that when it recieves no data as a response from a check, that the check failed? Or does anyone else have any ideas as to what may have happened?
The server that was being checked has been rebooted, and I'm not sure if the situation will manifest again to test it.