2

We are monitoring the disks on our servers using Smartmontools and Nagios with the check_smartmon or another Nagios plugin. It appears to work, as there are no errors. But how do I know if it is truly working?

It would be great to simulate an error on the disk and observe the error through the entire Nagios pipeline. From the Linux or FreeBSD commandline, s there a way to trigger a SMART fault on a disk drive or array without damaging the disk?

I found an old discussion on the smartmontools-support mailinglist, but it's not clear that this functionality was ever added.

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184
  • Updated my question. We are using Nagios for centralized monitoring. Will `-M test` create an error condition which will be detected by the smartmon utilities? – Stefan Lasiewski Feb 09 '13 at 00:06
  • 1
    It looks like `check_smartmon` just executes `smartctl` and parses the results, so I don't see any good way to test it like you seem to want. You might want to have smartd set up and emailing you anyway. – Michael Hampton Feb 09 '13 at 00:12
  • I see what you're saying. If I could manually trigger an error condition then any admin could test the SMART monitoring from a variety of angles. – Stefan Lasiewski Feb 09 '13 at 00:41
  • And while there are two votes to close, note that this not a dupe. While this question is similar to [Is smartd properly configured to send alerts by email?](http://serverfault.com/questions/426761/is-smartd-properly-configured-to-send-alerts-by-email), but that answer only appears to test the email functionality, not the SMART monitoring itself. – Stefan Lasiewski Feb 09 '13 at 00:50
  • 2
    I suspect your best option is to find a drive that is showing some SMART errors already, and keep it round for future testing. It's quite likely that an older desktop drive will have some number of reallocated sectors, which should be enough to show something with check_smartmon. – Daniel Lawson Feb 09 '13 at 00:53
  • I suspect you're right. I am just looking for a way to do this quickly on the commandline. – Stefan Lasiewski Feb 09 '13 at 06:27
  • 2
    Why don't you make a fake `smartctl` that produces whatever output you want? – sendmoreinfo Feb 09 '13 at 15:20

2 Answers2

3

If the drive firmware supports it, hdparm can be used to manually corrupt some sectors via its --make-bad-sector option. Note that this will really corrupt a sector, which means that:

  • on subsequent read, the sector will be "discovered" as unreadable with a corresponding increase in SMART attribute 197 - Current Pending Sector
  • on subsequent write, the sector will be remapped using a spare sector, with a corresponding increase in SMART attribute 5 - Reallocated Sector Count

Please note that hdparm distinguishes between a "normal" and "flagged" corruption: in the former, any read will timeout as if the sector were genuinely bad; in the latter any read will immediately be aborted.

Be sure to understand that, using the method above, you are really corrupting sectors, with relative reallocation events - ie you are somehow "damaging" your drive.

Finally, to recover a sector before it is reallocated you can use the --repair-sector option.

Back to smartmontools: you can use an old drive to simulate such errors, giving smartd a chance to alert you and checking the effectiveness of your smartctl configuration.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
0

You can not simulate damage on hard drives without having any real damage to check Smartmontools "power". Smartmontools is really good tool, it's working, and always safe and reliable. You can get some HDD with bad sectors instead?

Luka
  • 375
  • 5
  • 21