3

I've inherited administration of a server with a RAID 5 array. We have a damaged database on the array that is just over half the allocated size, so makes recovery impossible.
I recently changed the spare disk in bay 25 to be part of the array (would give adequate space for DB recovery), the rebuild looked to start OK.
But then a faulty disk in bay 22 was reported. This has been replaced and now I'm stuck with the server showing the array config status "RAID5, Ready for Rebuild".
Can anyone help?

=> ctrl slot=1 show config

Smart Array P600 in Slot 1    (sn: P92B3AF9SXL040)

array A (SAS, Unused Space: 297996 MB)

  logicaldrive 1 (6.3 TB, RAID 5, Ready for Rebuild)

  physicaldrive 1E:1:1 (port 1E:box 1:bay 1, SAS, 300 GB, OK)
  physicaldrive 1E:1:2 (port 1E:box 1:bay 2, SAS, 300 GB, OK)
  physicaldrive 1E:1:3 (port 1E:box 1:bay 3, SAS, 300 GB, OK)
  physicaldrive 1E:1:4 (port 1E:box 1:bay 4, SAS, 300 GB, OK)
  physicaldrive 1E:1:5 (port 1E:box 1:bay 5, SAS, 300 GB, OK)
  physicaldrive 1E:1:6 (port 1E:box 1:bay 6, SAS, 300 GB, OK)
  physicaldrive 1E:1:7 (port 1E:box 1:bay 7, SAS, 300 GB, OK)
  physicaldrive 1E:1:8 (port 1E:box 1:bay 8, SAS, 300 GB, OK)
  physicaldrive 1E:1:9 (port 1E:box 1:bay 9, SAS, 300 GB, OK)
  physicaldrive 1E:1:10 (port 1E:box 1:bay 10, SAS, 300 GB, OK)
  physicaldrive 1E:1:11 (port 1E:box 1:bay 11, SAS, 300 GB, OK)
  physicaldrive 1E:1:12 (port 1E:box 1:bay 12, SAS, 300 GB, OK)
  physicaldrive 1E:1:13 (port 1E:box 1:bay 13, SAS, 300 GB, OK)
  physicaldrive 1E:1:14 (port 1E:box 1:bay 14, SAS, 300 GB, OK)
  physicaldrive 1E:1:15 (port 1E:box 1:bay 15, SAS, 300 GB, OK)
  physicaldrive 1E:1:16 (port 1E:box 1:bay 16, SAS, 300 GB, OK)
  physicaldrive 1E:1:17 (port 1E:box 1:bay 17, SAS, 300 GB, OK)
  physicaldrive 1E:1:18 (port 1E:box 1:bay 18, SAS, 300 GB, OK)
  physicaldrive 1E:1:19 (port 1E:box 1:bay 19, SAS, 300 GB, OK)
  physicaldrive 1E:1:20 (port 1E:box 1:bay 20, SAS, 300 GB, OK)
  physicaldrive 1E:1:21 (port 1E:box 1:bay 21, SAS, 300 GB, OK)
  physicaldrive 1E:1:22 (port 1E:box 1:bay 22, SAS, 300 GB, OK)
  physicaldrive 1E:1:23 (port 1E:box 1:bay 23, SAS, 300 GB, OK)
  physicaldrive 1E:1:24 (port 1E:box 1:bay 24, SAS, 300 GB, OK)
  physicaldrive 1E:1:25 (port 1E:box 1:bay 25, SAS, 300 GB, OK)
peterh
  • 4,914
  • 13
  • 29
  • 44
TrevorW
  • 41
  • 1
  • 2

2 Answers2

10

This is a bit crazy... A Smart Array P600 PCI-X RAID controller (circa 2005)?!? 25 disks? RAID 5? Is this an HP MSA70 enclosure? Is probably not the HP D2700?


"Ready for Rebuild" is about the worst array status message you can receive on an HP ProLiant system. This indicates that the logicaldrive can't finish its rebuild because there's trouble reading from a partner or dependent drive(s). Usually this means that you have a failed disk and a failing disk. This is also known as an Unrecoverable Read Error (URE).

Please see the following:

RAID 1 fault " Status Ready for Rebuild : Rebuild Percentage Complete 0%"

HP Proliant ML350 G5 SAS HDD

Force LUN in a HP Smart Array to rebuild

24 disks in RAID5 is stupid. That's not your fault. 25 disks is, though. It's too many drives for RAID5, even with the 10k RPM enterprise disks you have. Losing your spare in order to add 300GB of space was a bad move because of the I/O and time impact of expanding such a large disk group. It hits all the disks and would have taken a very long time. Too much risk and exposure involved.

There is a slight chance that you are running into a controller firmware issue or configuration limitation. The last release of firmware for that controller was in 2009. Old gear plus a really abnormal configuration like yours are edge cases that require some work to fix. This could also be a problem with the enclosure.

  • Do you have good backups?
  • Are you in a position to bring firmware of all components up-to-date?
  • Can you power cycle everything here and watch the system POST messages closely to read the RAID controller output?
  • You may be able to jump-start the rebuild process, assuming there are no real READ errors on the drives.

So it's counter-intuitive, but a power-off, wait and power-on may be your best bet.
It could also be your worst bet, so hopefully you have backups. :(

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • I totally agree that this is too many disks for a RAID5 configuration. RAID5+5 or RAID6 with a hot spare would have been more appropriate with this number of disks. – kasperd Oct 23 '14 at 17:16
1

An old post, i know, but might be helpfull for others. My P410i does this most of the time, when i replace a disk. The new disk initializes, and then it says Ready to rebuild, but nothing happens. When it does this, i unplug the power for the disk i just replaced, that it wouldn't rebuild. Wait 10-15 secs, and replug it, then rebuilding starts. I'm running a Raid 50 with 8 disks on it - think it is on 6 years and counting, and have had 3 faulty disks over time - But rebuilded 6 times, because i replaced with a temporary disk, until i got the right spare.