5

I've tried the best plugin for nagios to SMART monitoring. There are some, but only monitoring temperature. But with smartctl we can find more data. Do you have some better plugins with all data from smart?

Brent Pabst
  • 6,059
  • 2
  • 23
  • 36
Rafał Kamiński
  • 187
  • 2
  • 5
  • 15

2 Answers2

11

The check_ide_smart plugin is part of the standard nagios plugins group. Despite the "ide" part of the name, it uses smartctl to check any drive that smartctl supports.

It can return nagios-suitable output, e.g:

$ ./check_ide_smart -n -d /dev/sda
OK - Operational (17/17 tests passed)

Or the full SMART status:

$ ./check_ide_smart -d /dev/sda
Id=  1, Status=11 {PreFailure , OnLine }, Value=100, Threshold= 16, Passed
Id=  2, Status= 5 {PreFailure , OffLine}, Value=100, Threshold= 50, Passed
Id=  3, Status= 7 {PreFailure , OnLine }, Value=120, Threshold= 24, Passed
Id=  4, Status=18 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=  5, Status=51 {PreFailure , OnLine }, Value=100, Threshold=  5, Passed
Id=  7, Status=11 {PreFailure , OnLine }, Value=100, Threshold= 67, Passed
Id=  8, Status= 5 {PreFailure , OffLine}, Value=100, Threshold= 20, Passed
Id=  9, Status=18 {Advisory    , OnLine }, Value= 96, Threshold=  0, Passed
Id= 10, Status=19 {PreFailure , OnLine }, Value=100, Threshold= 60, Passed
Id= 12, Status=50 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=192, Status=50 {Advisory    , OnLine }, Value= 99, Threshold= 50, Passed
Id=193, Status=18 {Advisory    , OnLine }, Value= 99, Threshold= 50, Passed
Id=194, Status= 2 {Advisory    , OnLine }, Value=144, Threshold=  0, Passed
Id=196, Status=50 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=197, Status=34 {Advisory    , OnLine }, Value=100, Threshold=  0, Passed
Id=198, Status= 8 {Advisory    , OffLine}, Value=100, Threshold=  0, Passed
Id=199, Status=10 {Advisory    , OnLine }, Value=200, Threshold=  0, Passed
OffLineStatus=0 {NeverStarted}, AutoOffLine=No, OffLineTimeout=30 minutes
OffLineCapability=91 {Immediate Auto SuspendOnCmd}
SmartRevision=16, CheckSum=23, SmartCapability=3 {SaveOnStandBy AutoSave}
Keith
  • 4,627
  • 14
  • 25
  • Beware `check_ide_smart` might consider disk as healthy even when overall health check fails: `smartctl -H /dev/sda` returns `SMART overall-health self-assessment test result: FAILED!` – Tombart Oct 12 '17 at 10:46
2

I've used the plugin: check_ide_smart ; however, I eventually discovered that it did not notify me regarding errors in the smart log on the disk.

The problem bug is apparently still open after 5 years?

#473 check_ide_smart ignores SMART errors ! http://sourceforge.net/p/nagiosplug/bugs/473/

I am now enabling a more detailed smartd daemon on each system. I will then have nagios notify me if that process stops. I may have another check and restart if not running in cron.

From the smartd.conf:

First (primary) ATA/IDE hard disk. Monitor all attributes, enable automatic online data collection, automatic Attribute autosave, and start a short self-test every day between 2-3am, and a long self test Saturdays between 3-4am. report raw temperature changes >= 5 Celsius


smartd.conf

DEVICESCAN -H -m root -a -o on -S on -s (S/../.././02|L/../../6/03) -W 5

  • [1] TBH i've found check_ide_smart not airworthy enough. As far as i understand it checks only attribues for values lover then Threshold (compare with sourcecode). The errors you've pasted: Error 7 occurred at disk power-on lifetime: 9006 hours (375 days + 6 hours) When the command that caused the error occurred, the device was active or idle. – Mateusz Pacek Dec 28 '15 at 09:13
  • [2] means problems with communication between disk and 'motherboard' -as i belive. So the check_ide_smart works properly, but it doesn't check any other errors then attribues returned by smartctl. – Mateusz Pacek Dec 28 '15 at 09:16