Can a RAID5 be corrupted if Drive 0 has been failed and how could this be fixed?

Question

I have configured RAID5 on an ASUS server and installed VMware ESXi 4.1 on it.

The attached photo indicates that Drive 0 resides in a Failed state. All Hard Disk Drives (HDDs) are running Normal.

Has the RAID been corrupted?
How could this be fixed without losing the data?

RAID config+status screenshot

Is your RAID configured with a hot spare? If there is a hot spare, the RAID would automatically recover, and you can pull the failed disk out and replace it once the RAID has recovered. If there was no hot spare, the safest you can do for your data is to add a hot spare in an unused bay. If you filled up all the bays with drives, configured all of them as a RAID 5 without any hot spares, then you are now only a minor human error away from major trouble. — kasperd, Sep 09 '14 at 21:46

score 3 · Answer 1 · edited Sep 08 '14 at 18:00

You can not. Simple like that. Go back and install a backup. When things like that happen and you are prepared, the results are not really that bad - as you can see on what happened to me recently - RAID 6 blowing in 5 minutes.

But your RAID has not failed.

It is degraded. Now, go and read the documentation for your RAID controller and learn how to deal with it. It will involve fixing the failed disc (replacing it most likely).

You should do so FAST - because the next disc failing blows the whole data. So make a backup. As you should anyway.

RAID 5 protects from 1 failing disc (BTW. you now have terrible performance due to that - technical limitation of RAID). Nothing is corrupt though and once you fix the discs (automatic or manual - depends on controller) things will turn back to green.

dyasny · Answer 2 · 2014-09-08T18:38:08.743

2

Your RAID array isn't failed, it's degraded. That means only one disk is out. In my experience, this doesn't always mean a disk is dead.

I'd follow this simple action plan:

Make a backup and verify it is tested
Reseat the disk. If it's not hotswap, turn the server off, remove the connector and place it back. Then turn the server on and check the status of the raid array, it might go to "rebuilding", or "online" (and you have to kick off rebuilding manually by setting the disk as hotspare)
If the reseating didn't help, try to take the disk out completely, turn the server on without it, and then turn server off, conncet disk and try again
If the disk remains failed, try to attach another disk to the connector/slot of this one, you migh have a faulty connector/slot and not disk, and that means you need a new controller/backplane/cable - whatever the server has in it
The disk's firmware might be at fault. Update the controller firmware and the disk firmware, and see if that brings the disk back online.
If none of these work, you definitely need to replace the disk.

edited Sep 08 '14 at 18:38

answered Sep 08 '14 at 17:45

dyasny

18,482
6
48
63

Interesting. My approach is "replace disc, then test it on another setup because I do not trust it right now until I have fully tested it". – TomTom Sep 08 '14 at 18:04
1

This is from a few years doing pro server hardware support, with tens of thousands of machines passing through my hands. SCSI timeouts, causing disks to fly out of the array are more often caused by buggy firmware than by actual hardware faults – dyasny Sep 08 '14 at 18:40
And that is both sides. Seen that on SAS discs / Velociraptors had fora long time defective firmware that would make them unresponsive for some time everz 49 dazs uptime. Simple solution - reset servers monthly. Forget that and booom. – TomTom Sep 09 '14 at 04:47

Can a RAID5 be corrupted if Drive 0 has been failed and how could this be fixed?

2 Answers2