4

I want to run the smartctl self tests to check the health of the drives in my RAID array (PERC 5/i). The array is on sda and comprises six drives. I can check the status using

sudo smartctl /dev/sda -d megaraid,0 -a

And I see that SMART is available and enabled on all the drives. I have tried to run self tests using

 sudo smartctl /dev/sda -d megaraid,0 -t short

and

 sudo smartctl /dev/sda -d megaraid,0 -t long

I have also tried it on all of the drives 0-5. No matter what I try, when I run:

 sudo smartctl /dev/sda -d megaraid,0 -l selftest

I always get the same result, which seems to always report that I have never run a self test.

 /dev/sda [megaraid_disk_00] [SAT]: Device open changed type from 'megaraid' to 'sat'
 ===START OF READ SMART DATA SECTION ===
 SMART Self-test log structure revision number 1
 No self-tests have been logged.  [To run self-tests, use: smartctl -t]

From what I read, I should have no problem running the short and long self tests on the array while it is mounted. Does anyone else have experience running these tests on a PERC 5/i raid array who could lend some insight into what is causing the problem?

(smartmontools release 5.40 dated 2009-12-09 at 21:00:32 UTC)

mdpc
  • 11,698
  • 28
  • 51
  • 65
canzar
  • 85
  • 6

2 Answers2

1

This is a Dell Perc 5/i hardware array controller. Let it do its thing. If you don't have red or amber lights on the disks, why are you concerned with running your own S.M.A.R.T. tests?

The array controller uses S.M.A.R.T. in addition to other features/test to determine drive health. Running your own analysis is unnecessary.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 1
    I want to run the long suite of SMART tests to diagnose a problem I am having with the array where the write speeds periodically drop to near zero and the average wait time goes up substantially and queue fills up. I suspect the problem is with one of the drives, but there hasn't been a failure. – canzar Dec 18 '12 at 14:09
  • Could be a *failing* drive. But do you have a [write cache](http://serverfault.com/questions/450242/what-is-the-memory-module-on-a-raid-card-needed-for/450253#450253) enabled on the array card? – ewwhite Dec 18 '12 at 14:17
  • I suspect it is a failing drive, which is why I wanted to run the SMART tests to see if I could figure out what drive it is. The problem still occurs with write cache enabled, but is less pronounced (only occurs under heavy read/write load). – canzar Dec 18 '12 at 14:36
-1

This is an old thread, but let me say that the HW controllers in particular the patrol reads have a lot to be desired. Apparently they are supposed to test the disk surface and correct problems and they sometimes do that, but they never fix pending sectors on surfaces, while they could and should using the redundant data. So when you have a hard drive that has smart errors and want to switch it out you cannot truly know that the other drive (in raid1 for example) is fully readable, so a smart long test would be desirable. Yes ... I agree that a consistency check might work, but that will degrade the array and you will loose data that you could have saved with an optimal array that has undiscovered or known errors but still has a 100% readable data. The point is that raid firmware is buggy and the internal workings are over-hyped. They give a false security which is more dangerous than a system that you know will fail at one point for sure.

glucz
  • 1