Detect faulty drive in RAID 10 array

Question

I've been told that I can only verify my HW RAID array is working perfectly with KVM. However, I want to be automatically notified when there is a problem by my server.

Is there a way via SSH (that will be called via system() in php) that can detect that a drive is having problems? I don't need to identify which drive.

I have thought of one theory but I don't know if it will work in practice. If I were to run a PHP script to fopen('/dev/[filesystem]', 'r') and seeked every xGB for 1 byte and it seeks a position of the filesystem that's having problems, it should return an error. Am I correct in thinking this idea?

I use XFS filesystem, I have heard of xfs_check but that says it needs to be ran in read-only mode which is inconvenient.

I use 3ware RAID controller.

you should be reading your RAID controller's command line tools documentation to figure out how to retrieve an array's status and how to initiate a [scrub](http://en.wikipedia.org/wiki/Data_scrubbing) and get at its results. — the-wabbit, Oct 29 '14 at 23:33
Please provide the information about the server hardware make and model, the RAID controller and number and type of drives. — ewwhite, Oct 30 '14 at 00:01

score 3 · Accepted Answer · answered Oct 30 '14 at 06:12

Install the 3Ware tools (tw_cli) on your machine.

After you have installed them, get the id # of the controller (I've never understood the system behind it, for all I know it might be random):

$ tw_cli show

Ctl   Model        (V)Ports  Drives   Units   NotOpt  RRate   VRate  BBU
------------------------------------------------------------------------
c0    9550SXU-4LP  4         2        1       0       1       1      -

You can then query the array status with

$ tw_cli /c0 show

Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-1    OK             -       -       -       74.4951   ON     OFF

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     NOT-PRESENT      -      -           -             -
p1     NOT-PRESENT      -      -           -             -
p2     OK               u0     74.53 GB    156301488     9QZ07NP2
p3     OK               u0     74.53 GB    156301488     9QZ08DS2

Obviously, this will look different on your machine. These example where lifted from here.

To actively verify (scrub) your drives, use

$ tw_cli /c0/u0 start verify

For automatic notifications, you should setup a monitoring system, e.g. Nagios or Icinga and use a plugin that checks the health of the array with the help of tw_cli. These plugins work nicely without Nagios/Icinga as well and could be easily used in a minimal monitoring system in form of a cron job that sends a mail of the plugin doesn't return 0.

Thank you. I'm getting status 'DEGRADED' on 2 of the drives. I have found a tutorial to rebuild it but I have a question: do I _need_ to rebuild everytime I get this error, and when I rebuild it, no data currently on the drive will be lost, right? I also have a status saying 'VERIFIYING', I don't understand why that status is showing. I also had an "ECC-ERROR" status but I did a 'rescan' command and that error has gone now. It's also worth noting, I get IOErrors when trying to download certain files, I assume this is something to do with the 'DEGRADED' can these be fixed? — user3786834, Oct 30 '14 at 21:50
Sorry for all the questions, I'm very new to raid. During a rebuild, does this stop users from being able to upload/download from the entire storage, or disable them being able to download/upload files located on the drive currently being rebuilt, or will it not interupt anything (it can rebuild whilst letting users continue to grab files from the drive)? — user3786834, Oct 30 '14 at 22:11

Detect faulty drive in RAID 10 array

1 Answers1