0

There is an icinga server set up to run check_megaraid_sas. It has been working beautifully for the last ~7 months.

During this time several "Unable to read output" messages have shown up for a variety of reasons, and in all of those cases the status was UNKNOWN, which triggered our alert system.

Recently the drives went from warning, to critical, to OK:

  • 15:22:03 RAID-Health;WARNING; ...
  • 18:42:03 RAID-Health;CRITICAL; ...
  • 19:04:03 RAID-Health;OK;NRPE: Unable to read output

At the time of this issue, megaraidsas-status returned the following:

-- Arrays informations --
-- ID | Type | Size | Status

-- Disks informations
-- ID | Model | Status | Warnings

However, I would have expected the script to return "OK: Drives 0" (as some of the users comments suggest--still an error, but a different approach to fix). Since NRPE returned "Unable to read output" and marked it as "OK", it makes me think this is a problem with NRPE instead of the script.

Is there some way to convince NRPE that when it recieves no data as a response from a check, that the check failed? Or does anyone else have any ideas as to what may have happened?

The server that was being checked has been rebooted, and I'm not sure if the situation will manifest again to test it.

isaaclw
  • 123
  • 7
  • Can you run this script on your monitored server and show the output? I suppose the problem is with script output. – DukeLion May 21 '12 at 15:24

1 Answers1

1

This usually means that the check returned results (eg error messages interleaved with output) that violate the format for nagios check output. su - to the user the monitoring plugin runs as on the remote server and check the output; if it looks ok pipe it to hexdump -C since unexpected control characters can confuse nrpe.

rackandboneman
  • 2,487
  • 10
  • 8