I have a Dell PowerEdge 2950 with four drives, and this raid controller (Dell Perc 6/i):
lspci -nn
RAID bus controller [0104]: LSI Logic / Symbios Logic MegaRAID SAS 1078 [1000:0060] (rev 04)
Physical drives two and three (labeled a0e32s2 and a0e32s3) make up virtual drive a0d2, which is in state OFFLINE, because physical disk 2 went into the "ready" state instead of "online".
megaraidsas-status
-- Arrays informations -- -- ID | Type | Size | Status a0d0 | RAID 0 | 100GiB | optimal a0d1 | RAID 0 | 2693GiB | optimal a0d2 | RAID 0 | 3725GiB | OFFLINE
-- Disks informations -- ID | Model | Status | Warnings a0e32s0 | ATA ST31500341AS 1397GiB | online | errs: media:0 other:1 a0e32s1 | ATA ST31500341AS 1397GiB | online | errs: media:0 other:1 a0e32s2 | ATA Hitachi HDS72302 1863GiB | ready | errs: media:0 other:1 a0e32s3 | ATA Hitachi HDS72302 1863GiB | online | errs: media:0 other:1
The virtual drive is in an inconsistent state:
megacli -LDGetProp Consistency -L2 -a0
Virtual Drive:02(Target ID 02): Inconsistent.
When this happens, I can "import the foreign configuration", but the virtual disk remains inconsistent. I can blow away this virtual drive, recreate the RAID-0, doing a full initialization of the virtual drive, and make the virtual drive consistent, but eventually this happens again, always to the same physical drive.
How can I stop the foreign configuration thing from happening? I've replaced physical drive 2. And how do I find out what the "other" error is above?
The firmware for the RAID controller and the BIOS are the latest versions. I'm running Debian Squeeze and Debian Wheezy (both standard and latest backports kernel).
megacli -AdpEventLog -GetEvents -f events.log -aALL && less events.log
Command timeout on PD 02(e0x20/s2) Path 1221000002000000, CDB: 2a 00 07 60 5b 68 00 00 08 00 Removed: PD 02(e0x20/s2) Removed: PD 02(e0x20/s2) Info: enclPd=20, scsiType=0, portMap=02, sasAddr=1221000002000000,0000 000000000000 State change on PD 02(e0x20/s2) from ONLINE(18) to FAILED(11) State change on VD 02/2 from OPTIMAL(3) to OFFLINE(0) Controller cache pinned for missing or offline VD 02/2 VD 02/2 is now OFFLINE State change on PD 02(e0x20/s2) from FAILED(11) to UNCONFIGURED_BAD(1) Enclosure PD 20(c None/p0) element (SES code 0x17) status changed Policy change on VD 02/2 to [ID=02,dcp=0d,ccp=0c,ap=0,dc=0,dbgi=0] from +[ID=02,dcp=0d,ccp=0d,ap=0,dc=0,dbgi=0]Inserted: PD 02(e0x20/s2) Inserted: PD 02(e0x20/s2) Info: enclPd=20, scsiType=0, portMap=02, sasAddr=1221000002000000,0000000000000000 PD 02(e0x20/s2) is not a certified drive State change on PD 02(e0x20/s2) from UNCONFIGURED_BAD(1) to UNCONFIGURED_GOOD(0) Enclosure PD 20(c None/p0) element (SES code 0x17) status changed Foreign Configuration Detected
Thanks for any help!