I've got a sad RAID array on a 3ware 9650SE-16ML card. What I can't tell is if I've just suffered a double-disk failure (bummer!) or if I'm reading this wrong. The relavent output of /c0 show all
is:
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 DEGRADED u0 931.51 GB 1953525168 5QJ07MAH
p1 ECC-ERROR u0 931.51 GB 1953525168 5QJ0DCW9
p2 OK u0 931.51 GB 1953525168 5QJ0DW9C
p3 OK u0 931.51 GB 1953525168 5QJ0CKXJ
And the failure is (from show alarms
):
Ctl Date Severity Alarm Message
------------------------------------------------------------------------------
c0 [Sun Nov 20 07:47:23 2011] INFO Rebuild started: unit=0
c0 [Sun Nov 20 08:20:12 2011] ERROR Drive ECC error reported: port=1, unit=0
c0 [Sun Nov 20 08:20:12 2011] ERROR Source drive error occurred: port=1, unit=0
c0 [Sun Nov 20 08:20:12 2011] ERROR Rebuild failed: unit=0
c0 [Sun Nov 20 08:20:12 2011] INFO Rebuild paused: unit=0
I think that what happened is p0 failed, and then p1 had an ECC error (aka, my data is gone). But... maybe not? It stays at 97% rebuilt, but can't get past this error.
As far as I can tell, a previous admin turned off the periodic verify, which is what got us into this state. This isn't something most people should worry about with their 3Ware RAIDs!
Update
After beating on it for a couple of days, I did the IgnoreECC bit and it rebuilt, but my data is hosed. Bummer.