3

I have a hardware Adaptec ASR8405 RAID controller on which I have 15 disk RAID6 Array. One of the disks broke and after the replacement the Controller did not pick up on it, did not start the rebuild but went into Failed state instead (see below):

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical Device number 0
   Logical Device name                      : LogicalDrv 0
   Block Size of member drives              : 512 Bytes
   RAID level                               : 6 Reed-Solomon
   Unique Identifier                        : A0E20532
   Status of Logical Device                 : Failed
   Additional details                       : Initialized with Build/Clear
   Size                                     : 74347510 MB
   Parity space                             : 11438080 MB
   Stripe-unit size                         : 256 KB
   Interface Type                           : Serial ATA
   Device Type                              : HDD
   Read-cache setting                       : Enabled
   Read-cache status                        : On
   Write-cache setting                      : Enabled
   Write-cache status                       : Off
   Partitioned                              : No
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : No
   Power settings                           : Disabled
   --------------------------------------------------------
   Logical Device segment information
   --------------------------------------------------------
   Segment 0                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:0) K1JG4N8D
   Segment 1                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:1) K1JGHL7D
   Segment 2                                : Missing
   Segment 3                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:3) K1JGE6ZD
   Segment 4                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:4) K1JEWTND
   Segment 5                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:5) K1JENR3D
   Segment 6                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:6) K1JG2U0D
   Segment 7                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:7) K1JG66ED
   Segment 8                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:8) K1JGHJ6D
   Segment 9                                : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:9) K1JGELLD
   Segment 10                               : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:10) K1JG5XYD
   Segment 11                               : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:11) K1JGSTJD
   Segment 12                               : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:12) K1JG339D
   Segment 13                               : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:13) K1JG16KD
   Segment 14                               : Present (5723166MB, SATA, HDD, Enclosure:0, Slot:14) K1JEX09D

As you can see the disk in Segment2 of the Logical Device is reported as missing, however it shows up when checking for Physical Devices (with ready state):

  Device #2
     Device is a Hard drive
     State                              : Ready
     Block Size                         : 512 Bytes
     Supported                          : Yes
     Programmed Max Speed               : SATA 6.0 Gb/s
     Transfer Speed                     : SATA 12.0 Gb/s
     Reported Channel,Device(T:L)       : 0,6(6:0)
     Reported Location                  : Enclosure 0, Slot 2(Connector 0)
     Reported ESD(T:L)                  : 2,0(0:0)
     Vendor                             : ATA
     Model                              : HGST HUS726060AL
     Firmware                           : T907
     Serial number                      : K1GVY99D
     World-wide name                    : 5000CCA255CC3FA3
     Reserved Size                      : 4225560 KB
     Used Size                          : 0 MB
     Unused Size                        : 5719040 MB
     Total Size                         : 5723166 MB
     Write Cache                        : Enabled (write-back)
     FRU                                : None
     S.M.A.R.T.                         : No
     S.M.A.R.T. warnings                : 0
     Power State                        : Full rpm
     Supported Power States             : Full rpm,Powered off,Reduced rpm
     SSD                                : No
     Temperature                        : 42 C/ 107 F
     NCQ status                         : Enabled
  ----------------------------------------------------------------
  Device Phy Information
  ----------------------------------------------------------------
     Phy #0
        PHY Identifier                  : 0
        SAS Address                     : 50000D1701875C02
        Attached PHY Identifier         : 2
        Attached SAS Address            : 50000D1701875C3F
  ----------------------------------------------------------------
  Runtime Error Counters
  ----------------------------------------------------------------
     Hardware Error Count               : 0
     Medium Error Count                 : 0
     Parity Error Count                 : 0
     Link Failure Count                 : 0
     Aborted Command Count              : 0
     SMART Warning Count                : 0
  • Question 1: How do I make the Logical Device recognize the disk? I've tried rescan on the LD, clear, verify and initialize on the disk itself but nothing helps...
  • Question 2: Is there any chance to fix this and recover the data? I have a backup but there is over 40TB of data and recovering this from backup will not be funny.

  • Question 3: Is there any chance that if I change the LD state to OPTIMAL i will fix itself?

  • Question 4: Any other ideas on how to fix it?

Many thanks in advance for any hints!

wwn
  • 151
  • 1
  • 3

1 Answers1

2

I've manged to fix it via:

arcconf SETSTATE 1 LOGICALDRIVE 0 OPTIMAL ADVANCED nocheck noprompt

Immediately after changing of the state of logical drive the array started to rebuild automatically. Once the rebuild completed Verify with Fix started (again automatically). After the Verify finished everything was OK again (no data loss).

wwn
  • 151
  • 1
  • 3