2

I have a DELL server with PERC H700 Integrated controller. I've made RAID5 with 12 harddrives and the virtual device is in Optimal state, but I receive such errors under linux:

sd 0:2:0:0: [sda] Unhandled error code
sd 0:2:0:0: [sda]  Result: hostbyte=0x07 driverbyte=0x00
sd 0:2:0:0: [sda] CDB: cdb[0]=0x88: 88 00 00 00 00 07 22 50 bd 98 00 00 00 08 00 00
end_request: I/O error, dev sda, sector 30640487832
sd 0:2:0:0: [sda] Unhandled error code
sd 0:2:0:0: [sda]  Result: hostbyte=0x07 driverbyte=0x00
sd 0:2:0:0: [sda] CDB: cdb[0]=0x88: 88 00 00 00 00 07 22 50 bd 98 00 00 00 08 00 00
end_request: I/O error, dev sda, sector 30640487832
sd 0:2:0:0: [sda] Unhandled error code
sd 0:2:0:0: [sda]  Result: hostbyte=0x07 driverbyte=0x00
sd 0:2:0:0: [sda] CDB: cdb[0]=0x88: 88 00 00 00 00 07 22 50 bc e0 00 00 01 00 00 00
end_request: I/O error, dev sda, sector 30640487648

But all disk are in Firmware state: Online, Spun Up.
Also there is not a single ATA read or write error in any disk in the raid (I check them with smartctl -a -d sat+megaraid,N -H /dev/sda). The only strange thing is in the output in

megacli:
megacli -LDInfo -L0 -a0
...
Bad Blocks Exist: Yes

How could there be bad blocks in a Virtual Drive, which is in optimal state and no disk is broken or even with a single error? I tried "Consistency Check", but it finished successfully and the errors are still in dmesg. Could Someone help me to figure it out what is wrong with my raid?

EEAA
  • 108,414
  • 18
  • 172
  • 242
neoX
  • 141
  • 1
  • 7

2 Answers2

2

The "Bad blocks exist" indicator of MegaCLI refers to the Soft Bad Block Management table which works as follows (quote from the MegaRaid docs):

If the CU detects a media error on the source drive during rebuild, it initiates a sector read for that block. If the sector read fails, the CU adds entries to the Soft Bad Block Management (SBBM) table, writes this table to the target drive, and displays an error message.

Additional error messages are displayed if the SBBM table is 80% full or 100% full. If the SBBM table is completely full, the rebuild operation is aborted, and the drive is marked as FAIL.

The SBBM table would not contain the same "bad" markings as what is reported by SMART as the criteria and methods of action are very different.

Take a look at which of your drives is reporting errors using megacli -LDPDInfo -aAll and give it a closer examination.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • Ok, but still how can I reset the SBBM table. The array is in optimal, all disks are with media or other error - 0, all Predictive Failure Count - 0. Not a single error in any parameter and the array apparently is not healthy? Can I reset or I must creat the array from the scatch? – neoX Oct 23 '12 at 07:48
  • @neoX The table is typically cleared upon rebuild, after the drive with errors has been replaced. If it does not clear, you should contact Dell support. The I/O errors upon accessing certain sectors on your LD without any reflection in error statistics of the physical devices might be a problem with the controller itself, the Dell support should troubleshoot that too. – the-wabbit Oct 23 '12 at 12:13
0

I got this issue recently. There was 'Bad Blocks Exist: Yes' message on the array, but all LD and PD was fine. There were errors on read on that array.

I found command -LDBBMClr which clear that fkng table.

megacli -LDBBMClr -L0 -a0 (change number accordingly to your adapter/array)

Please do not forget to remount filesystem (or reboot), as earlier read errors may cause some issues later.

George Shuklin
  • 226
  • 2
  • 7