Replacing a predicted to fail SAS drive in a Dell MD1220

Question

I have 24 disk enclosure using RAID 5 with 1 hot swap. Here is what it looks like in Dell's OpenManage Software.

As you can see one of the disks is predicted to fail. I have a replacement disk with the exact same specifications. Is it as simple as removing the 'bad' disk and replacing it. Will the hot swap take over? Or do I have to reconfigure anything?

I'm not sure why I said R5, it's R6. And yes it's a single array but backed up daily. What is the problem? — jwillis0720, Oct 28 '16 at 11:29
R5 is dangerous, has been for years, http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/ — Chopper3, Oct 28 '16 at 20:26

score 1 · Answer 1 · answered Oct 06 '16 at 22:33

1

Remove bad disk.
Replace bad disk with new disk.
Monitor the rebuild process.

See page 31 of the manual.

answered Oct 06 '16 at 22:33

ewwhite

194,921
91
434
799

If drive is spun up and I need to "prepare for removal from controller software", do I have to reboot the system to access the PERC controller? – jwillis0720 Oct 07 '16 at 01:08
No need to reboot into the PERC BIOS - all the needed steps can be performed with the use of OpenManage Server Administrator. – JimNim Oct 10 '16 at 16:50

score 1 · Answer 2 · answered Oct 10 '16 at 16:40

Simply removing the bad disk and inserting the replacement will get the job done, but it's not the safest method. You're using RAID5, so your data is already at high risk of corruption or loss as it is.

Check out the "Replacing A Physical Disk Receiving SMART Alerts" section of the Server Administrator Storage Management User's Guide for the recommended procedure.

I would strongly recommend that you perform a consistency check before replacement - this helps reduce the risk of encountering data corruption during the drive replacement process, especially in the event of a rebuild. The guide I linked notes that "failure to perform a check consistency can result in data loss." After a consistency check, you could move forward with the steps listed in that section of the document (the same steps that ewwhite suggested).

A method that may be potentially safer than manually failing the problematic drive (which forces your RAID5 into a degraded state) would be the steps listed in the "Virtual Disk Task: Replace Member Disk" section - this essentially mirrors data from the problem drive over to a spare without putting the array in a degraded state during the process. The benefit here is that if a different drive failed during the process, you would not lose data accessibility. This method also improves your odds of avoiding double fault conditions that result from bad blocks and lead to corruption (punctured stripes).

Replacing a predicted to fail SAS drive in a Dell MD1220

2 Answers2