1

I've RAID 5 array that started acting wonky lately. During POST it shows this error:

Slot 4  HP Smart Array P600 Controller       (256MB, v2.04)   1 Logical Drive
1789-Slot 4 Drive Array Disk Drive(s) Not Responding
    Check cables or replace the following drive(s):
         Port 2I: Box  1: Bay 2
  Select "F1" to continue - all logical drive(s) will remain disabled
  Select "F2" to fail drive(s) that are not responding - Interim Recovery
              mode will be enabled if configured for fault tolerance

Then it boots (after a 45 second delay) and starts rebuilding:

# hpacucli ctrl all show config
Smart Array P600 in Slot 4                (sn: P92B3AF9SXR018)

   array A (SAS, Unused Space: 0  MB)


      logicaldrive 1 (410.1 GB, RAID 5, Recovering, 62% complete)

      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 146 GB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 146 GB, Rebuilding)
      physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 146 GB, OK)
      physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 146 GB, OK)

(This takes about 40 minutes.)

We've tried replacing the disk in bay 2, with no perceptible difference.

We've added a spare disk into bay 5, added it to the array as a spare, but it was never used, so I removed it from spares.

Question: Is there a way to convince the RAID controller to drop drive 2I:1:2 from the array and use 2I:1:5 in its place? I've tried

hpacucli ctrl slot=4 array A modify drives=1:1,1:3,1:4,1:5

but this fails with Error: Cannot create array. Cannot add physical drive 1:1.

(Update: smartctl -a -d cciss,$NR /dev/cciss/c0d0 tells me all five disks think they're self-assessing as OK.)

Marius Gedminas
  • 454
  • 3
  • 9
  • I'm voting to close this question as off-topic because the OP is describing a 10-12 year-old server. This is beyond any level of reasonable support. – ewwhite Nov 10 '17 at 17:02

1 Answers1

1

You're supposed to press F2 when prompted.

See: HP ProLiant disk failure, proceed or do not proceed

I'll add that the HP Smart Array P600 is a very old PCI-X controller, dating back to 2004/2005... So, are you dealing with a 11+ year-old server, controller and disks?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • The POST message indicates `F2` is the default action that will take place after that 45-minute timeout, which is fine with me. Yes, this is a 10-year old server. The controller was recently replaced (twice). The original disk is 10 years old, the new one is a bit younger. – Marius Gedminas Nov 09 '17 at 15:14
  • The servers are in a data center in a different country, which makes it a bit cumbersome to look at LEDs or wiggle wires. The server is supposed to be decommissioned after a few months, which is why I'm not too keen on replacing it wholesale. – Marius Gedminas Nov 09 '17 at 15:16
  • @MariusGedminas I don't understand what you're trying to do. – ewwhite Nov 09 '17 at 19:37
  • I would like to avoid needless RAID rebuilds on every power cycle. I'm trying to convince the RAID controller to use a disk in a different bay to achieve that. – Marius Gedminas Nov 10 '17 at 08:08
  • @MariusGedminas That's not how this works. Does the RAID rebuild ever complete? Have you checked backplane connections? Actually, maybe this shouldn't matter because the server is too old to really try to troubleshoot... But you're providing incomplete information. – ewwhite Nov 10 '17 at 13:05
  • And why is the system rebooting so often? – ewwhite Nov 10 '17 at 13:06
  • The RAID rebuild completes in 40 minutes. We've asked a data centre tech to check the connections, which did not help. The system is rebooting often because we keep rebooting it after making changes (such as swapping disks etc.) to see whether they helped or not (so far nothing has helped). – Marius Gedminas Nov 10 '17 at 15:24
  • What more information should I be supplying? – Marius Gedminas Nov 10 '17 at 15:26