I'm working on a low-budget configuration change that involves a migration from a working reflashed IBM M1010 (LSI9220-8i) environment to a newer server running an LSI9200-8e SAS HBA.
Everything works fine on the old server, but it uses lots of power and a lower power-cost configuration is desired.
When the disks are disconnected from the old server and connected to the new server, I get a sequence like this in the logs:
Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (18317688 kB) Jan 6 13:15:17 hostname1 kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: MSI-X vectors supported: 1, no of cores: 4, max_msix_vectors: -1 Jan 6 13:15:17 hostname1 kernel: mpt2sas1-msix0: PCI-MSI-X enabled: IRQ 34 Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: iomem(0x00000000fbff0000), mapped(0xffffc90003620000), size(16384) Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: ioport(0x0000000000006000), size(256) Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: Allocated physical memory: size(4422 kB) Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: Current Controller Queue Depth(1948),Max Controller Queue Depth(2040) Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: Scatter Gather Elements per IO(128) Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: Protocol=( Jan 6 13:15:17 hostname1 kernel: Initiator Jan 6 13:15:17 hostname1 kernel: ,Target Jan 6 13:15:17 hostname1 kernel: ), Jan 6 13:15:17 hostname1 kernel: Capabilities=( Jan 6 13:15:17 hostname1 kernel: TLR Jan 6 13:15:17 hostname1 kernel: ,EEDP Jan 6 13:15:17 hostname1 kernel: ,Snapshot Buffer Jan 6 13:15:17 hostname1 kernel: ,Diag Trace Buffer Jan 6 13:15:17 hostname1 kernel: ,Task Set Full Jan 6 13:15:17 hostname1 kernel: ,NCQ Jan 6 13:15:17 hostname1 kernel: ) Jan 6 13:15:17 hostname1 kernel: scsi host4: Fusion MPT SAS Host Jan 6 13:15:17 hostname1 kernel: mpt2sas_cm1: sending port enable !! ... trimmed out probably unrelated messages ... Jan 6 13:15:19 hostname1 kernel: mpt2sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b005722a20), phys(8) ... trimmed out probably unrelated messages ... Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00 Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0) Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: enclosure_logical_id(0x500605b005722a20), slot(0) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: task abort: FAILED scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: attempting device reset! scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00 Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0) Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: enclosure_logical_id(0x500605b005722a20), slot(0) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: device reset: FAILED scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: attempting target reset! scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00 Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0) Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: enclosure_logical_id(0x500605b005722a20), slot(0) Jan 6 13:15:40 hostname1 kernel: scsi target4:0:0: target reset: FAILED scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: mpt2sas_cm1: attempting host reset! scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00 Jan 6 13:15:40 hostname1 kernel: mpt2sas_cm1: Blocking the host reset Jan 6 13:15:40 hostname1 kernel: mpt2sas_cm1: host reset: FAILED scmd(ffff880488f78380) Jan 6 13:15:40 hostname1 kernel: scsi 4:0:0:0: Device offlined - not ready after error recovery
I've already flashed the latest LSI firmware as that seemed like the most likely source of problems. The driver seems to confirm this:
LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
The firmware was obtained here: https://docs.broadcom.com/docs-and-downloads/host-bus-adapters/host-bus-adapters-common-files/sas_sata_6g_p20/9200-8e_Package_P20_IT_FW_BIOS_for_MSDOS_Windows.zip
The firmware was flashed using a FreeDOS bootable live "CD" from http://pingtool.org/bootable-dos-iso-bios-upgrade/
No BIOS is loaded, since that was how the 8E cards were shipped and I'm not trying to boot from them so there shouldn't be any need for a BIOS.
I can see one drive per SAS channel, but there are three to four drives present. The one drive seems to operate normally.
I've tried some cable-swapping to see if the problem follows a particular cable-- it does not seem to.
I plan to try CentOS 6 in case there's a driver issue or boot time race condition causing the problem. The old working server runs CentOS 6
I also plan to try a different disk enclosure, just in case there's a timing issue or some other odd physical layer issue even though most of the physical layer is the same exact storage hardware between hosts.
What else should I look at?