0

I'm working on a low-budget configuration change that involves a migration from a working reflashed IBM M1010 (LSI9220-8i) environment to a newer server running an LSI9200-8e SAS HBA.

Everything works fine on the old server, but it uses lots of power and a lower power-cost configuration is desired.

When the disks are disconnected from the old server and connected to the new server, I get a sequence like this in the logs:

Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (18317688 kB)
Jan  6 13:15:17 hostname1 kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: MSI-X vectors supported: 1, no of cores: 4, max_msix_vectors: -1
Jan  6 13:15:17 hostname1 kernel: mpt2sas1-msix0: PCI-MSI-X enabled: IRQ 34
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: iomem(0x00000000fbff0000), mapped(0xffffc90003620000), size(16384)
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: ioport(0x0000000000006000), size(256)
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: Allocated physical memory: size(4422 kB)
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: Current Controller Queue Depth(1948),Max Controller Queue Depth(2040)
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: Scatter Gather Elements per IO(128)
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: Protocol=(
Jan  6 13:15:17 hostname1 kernel: Initiator
Jan  6 13:15:17 hostname1 kernel: ,Target
Jan  6 13:15:17 hostname1 kernel: ),
Jan  6 13:15:17 hostname1 kernel: Capabilities=(
Jan  6 13:15:17 hostname1 kernel: TLR
Jan  6 13:15:17 hostname1 kernel: ,EEDP
Jan  6 13:15:17 hostname1 kernel: ,Snapshot Buffer
Jan  6 13:15:17 hostname1 kernel: ,Diag Trace Buffer
Jan  6 13:15:17 hostname1 kernel: ,Task Set Full
Jan  6 13:15:17 hostname1 kernel: ,NCQ
Jan  6 13:15:17 hostname1 kernel: )
Jan  6 13:15:17 hostname1 kernel: scsi host4: Fusion MPT SAS Host
Jan  6 13:15:17 hostname1 kernel: mpt2sas_cm1: sending port enable !!

... trimmed out probably unrelated messages ...

Jan  6 13:15:19 hostname1 kernel: mpt2sas_cm1: host_add: handle(0x0001), sas_addr(0x500605b005722a20), phys(8)

... trimmed out probably unrelated messages ...

Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0)
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: enclosure_logical_id(0x500605b005722a20), slot(0)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: task abort: FAILED scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: attempting device reset! scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0)
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: enclosure_logical_id(0x500605b005722a20), slot(0)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: device reset: FAILED scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: attempting target reset! scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: handle(0x0009), sas_address(0x4433221100000000), phy(0)
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: enclosure_logical_id(0x500605b005722a20), slot(0)
Jan  6 13:15:40 hostname1 kernel: scsi target4:0:0: target reset: FAILED scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: mpt2sas_cm1: attempting host reset! scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: CDB: Inquiry 12 00 00 00 24 00
Jan  6 13:15:40 hostname1 kernel: mpt2sas_cm1: Blocking the host reset
Jan  6 13:15:40 hostname1 kernel: mpt2sas_cm1: host reset: FAILED scmd(ffff880488f78380)
Jan  6 13:15:40 hostname1 kernel: scsi 4:0:0:0: Device offlined - not ready after error recovery

I've already flashed the latest LSI firmware as that seemed like the most likely source of problems. The driver seems to confirm this:

LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00)

The firmware was obtained here: https://docs.broadcom.com/docs-and-downloads/host-bus-adapters/host-bus-adapters-common-files/sas_sata_6g_p20/9200-8e_Package_P20_IT_FW_BIOS_for_MSDOS_Windows.zip

The firmware was flashed using a FreeDOS bootable live "CD" from http://pingtool.org/bootable-dos-iso-bios-upgrade/

No BIOS is loaded, since that was how the 8E cards were shipped and I'm not trying to boot from them so there shouldn't be any need for a BIOS.

I can see one drive per SAS channel, but there are three to four drives present. The one drive seems to operate normally.

I've tried some cable-swapping to see if the problem follows a particular cable-- it does not seem to.

I plan to try CentOS 6 in case there's a driver issue or boot time race condition causing the problem. The old working server runs CentOS 6

I also plan to try a different disk enclosure, just in case there's a timing issue or some other odd physical layer issue even though most of the physical layer is the same exact storage hardware between hosts.

What else should I look at?

Steve Bonds
  • 874
  • 2
  • 10
  • 19

1 Answers1

0

I ended up re-flashing the SAS card using the same firmware with the following changes:

  1. I erased the old firmware first
  2. I flashed in a BIOS even though it "shouldn't" be necessary since I'm not booting from the SAS card

Details:

  1. Remove all internal and external drives to prevent any chance of accidental overwrites
  2. Boot from the FreeDOS ISO mentioned above, customized to include the sas2flsh binary and firmware/BIOS files
  3. DO NOT REBOOT UNTIL THE BELOW TWO STEPS ARE COMPLETED
  4. sas2flsh -o -e 6
  5. sas2flsh -o -f 9200_8E.BIN -b MPTSAS2.ROM
  6. sas2flsh -list
  7. "eject" ISO and reboot

Here's an example of the working card's result of "sas2flsh -list":

E:\FREEDOS>sas2flsh -c 1 -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

        Adapter Selected is a LSI SAS: SAS2008(B2)

        Controller Number              : 1
        Controller                     : SAS2008(B2)
        PCI Address                    : 00:07:00:00
        SAS Address                    : 500605b-0-0572-2a20
        NVDATA Version (Default)       : 14.01.00.07
        NVDATA Version (Persistent)    : 14.01.00.07
        Firmware Product ID            : 0x2213 (IT)
        Firmware Version               : 20.00.07.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9200-8e
        BIOS Version                   : 07.39.02.00
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : SAS9200-8e
        Board Assembly                 : H3-25321-00C
        Board Tracer Number            : SP24651750

        Finished Processing Commands Successfully.
        Exiting SAS2Flash.

Once that was done all the disks magically appeared correctly under CentOS 7.

Steve Bonds
  • 874
  • 2
  • 10
  • 19