One of our SuperMicro servers with an onboard LSI 2308 RAID controller has been having issues with our main RAID10 array consisting of 4 Seagate 600 PRO SSD drives (in slots 0,1,2,3).
It started with a consistency check that resulted in a large number of the following errors:
Controller ID: 0 Consistency Check detected uncorrectable multiple medium errors: ( PD -:-:255 Location 0x2048421 VD 1)
This consistency check ended up failing. I then decided the array was no longer to be trusted, so I remade the array. I first created an image of the array using ddrescue. Unfortunately some minor data loss occurred, but most of the data was ok.
I checked all the drives using SeaTools. All four passed all tests, so I figured they should be ok. I took this opportunity to upgrade the firmware on the controller and the drives. After deleting the VD and recreating a new RAID 10 array, I copied the ddrescue image back to the drive with no problems. The system booted fine and all seemed ok. After waiting for the array to sync I ran another consistency check, and again it resulted in a number of uncorrectable multiple medium errors.
I concluded that one or more of the hard drives must be faulty, so I bought two new (larger size) samsung SSDs and created a new RAID1 device consisting of just these two new drives. I used different slots as an extra precaution (slot 6,7). Unfortunately, after copying the data back and syncing the array, a consistency check is still throwing uncorrectable multiple medium errors, although there are only two bad sectors this time.
Note that the number of bad sectors and the location of the bad sectors changed both times I remade the array.
The server is running seemingly ok now and I have checked the two bad sectors and they contain no files at the moment. The array can still not be trusted and I am out of ideas. What else can I try to fix this issue?