1

One of our SuperMicro servers with an onboard LSI 2308 RAID controller has been having issues with our main RAID10 array consisting of 4 Seagate 600 PRO SSD drives (in slots 0,1,2,3).

It started with a consistency check that resulted in a large number of the following errors:

Controller ID: 0 Consistency Check detected uncorrectable multiple medium errors: ( PD -:-:255 Location 0x2048421 VD 1)

This consistency check ended up failing. I then decided the array was no longer to be trusted, so I remade the array. I first created an image of the array using ddrescue. Unfortunately some minor data loss occurred, but most of the data was ok.

I checked all the drives using SeaTools. All four passed all tests, so I figured they should be ok. I took this opportunity to upgrade the firmware on the controller and the drives. After deleting the VD and recreating a new RAID 10 array, I copied the ddrescue image back to the drive with no problems. The system booted fine and all seemed ok. After waiting for the array to sync I ran another consistency check, and again it resulted in a number of uncorrectable multiple medium errors.

I concluded that one or more of the hard drives must be faulty, so I bought two new (larger size) samsung SSDs and created a new RAID1 device consisting of just these two new drives. I used different slots as an extra precaution (slot 6,7). Unfortunately, after copying the data back and syncing the array, a consistency check is still throwing uncorrectable multiple medium errors, although there are only two bad sectors this time.

Note that the number of bad sectors and the location of the bad sectors changed both times I remade the array.

The server is running seemingly ok now and I have checked the two bad sectors and they contain no files at the moment. The array can still not be trusted and I am out of ideas. What else can I try to fix this issue?

vdyvp
  • 11
  • 2
  • Can you post the output of `smartctl --all ` for all SSDs? – shodanshok Jun 15 '18 at 06:04
  • Please post your controller's logs (via MegaCli). I think that you have a faulty cabling or PSU. – Peter Zhabin Jun 15 '18 at 09:40
  • Smartctl does not work behind the LSI in RI mode. I used a different tool to get the smart data: https://imgur.com/a/2WkTFSp – vdyvp Jun 18 '18 at 22:38
  • Unfortunately, the LSI2308 does not support MegaCLI. I have pasted the Megaraid Storage Manger log on https://paste.ee/p/onCP3. Unfortunately, the log reset itself just before initializing the Samsung RAID 1 set. – vdyvp Jun 18 '18 at 22:51
  • Some events from the log above: sequence 308: initialize rebuilt Seagate RAID10 array start seq 313: initialize rebuilt Seagate RAID10 array end seq 318: Seagate array consistency check start seq 319-325: Seagate array consistency check fail seq 326: Seagate array consistency check end seq 539: deleted Seagate RAID10 array seq 542: created samsung RAID1 array seq 605: samsung array initialize start seq 608: samsung array initialize end seq 609: Samsung array consistency check start seq 610-611: Samsung array consistency check fail seq 612: Samsung array consistency check end – vdyvp Jun 19 '18 at 00:39
  • only SAS2IRCU can be used in LSI SAS2308, and MegaRAID storage Manager is used to manage SAS3108 etc. which tool do you use to check consistency? – user490531 Oct 04 '18 at 10:21
  • I use the MegaRAID storage Manager to run the consistency check. – vdyvp Oct 07 '18 at 21:44
  • @vdyvp which tool did you use to check the SMART data? I also have a RAID1 array with a few locations of uncorrectable multiple medium errors. – Kevin Morse Dec 01 '18 at 09:10
  • Hard disk sentinel (https://www.hdsentinel.com/download.php) – vdyvp Dec 02 '18 at 23:45

0 Answers0