2

I have two identical servers with same hardware and cloned Linux OS. They both have Supermicro HBA AOC-S3008L. Yet one server identifies SATA disks as SCSI while the other one correctly identifies them as SATA. My problem is that I need to use libatasmart or udisks (instead of smartmontools) to monitor the health of disks, but as disks are being identified as SCSI in one of the servers, I’m unable to do so. How can I make SystemB to identify disks as connected via SATA bus? Is there any specific option in BIOS causing this?

SystemA (SATA as SATA):

SystemA:~$ sudo udevadm info --query=property --name /dev/sda
DEVLINKS=/dev/disk/by-id/ata-WDC_WD60EZRX-00MVLB1_WD-WX21D947N3HR /dev/disk/by-id/wwn-0x50014ee2b5d6e7b0 /dev/disk/by-path/pci-0000:01:00.0-sas-0x500304801eac0aa1-lun-0
DEVNAME=/dev/sda
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:0/0:0:0:0/block/sda
DEVTYPE=disk
ID_ATA=1
ID_ATA_DOWNLOAD_MICROCODE=1
ID_ATA_FEATURE_SET_HPA=1
ID_ATA_FEATURE_SET_HPA_ENABLED=1
ID_ATA_FEATURE_SET_PM=1
ID_ATA_FEATURE_SET_PM_ENABLED=1
ID_ATA_FEATURE_SET_PUIS=1
ID_ATA_FEATURE_SET_PUIS_ENABLED=0
ID_ATA_FEATURE_SET_SECURITY=1
ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=66306
ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=66306
ID_ATA_FEATURE_SET_SMART=1
ID_ATA_FEATURE_SET_SMART_ENABLED=1
ID_ATA_ROTATION_RATE_RPM=5700
ID_ATA_SATA=1
ID_ATA_SATA_SIGNAL_RATE_GEN1=1
ID_ATA_SATA_SIGNAL_RATE_GEN2=1
ID_ATA_WRITE_CACHE=1
ID_ATA_WRITE_CACHE_ENABLED=1
ID_BUS=ata
ID_MODEL=WDC_WD60EZRX-00MVLB1
ID_MODEL_ENC=WDC\x20WD60EZRX-00MVLB1\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
ID_PART_TABLE_TYPE=gpt
ID_PATH=pci-0000:01:00.0-sas-0x500304801eac0aa1-lun-0
ID_PATH_TAG=pci-0000_01_00_0-sas-0x500304801eac0aa1-lun-0
ID_REVISION=80.00A80
ID_SERIAL=WDC_WD60EZRX-00MVLB1_WD-WX21D947N3HR
ID_SERIAL_SHORT=WD-WX21D947N3HR
ID_TYPE=disk
ID_WWN=0x50014ee2b5d6e7b0
ID_WWN_WITH_EXTENSION=0x50014ee2b5d6e7b0
MAJOR=8
MINOR=0
SUBSYSTEM=block
UDISKS_ATA_SMART_IS_AVAILABLE=1
UDISKS_PARTITION_TABLE=1
UDISKS_PARTITION_TABLE_COUNT=1
UDISKS_PARTITION_TABLE_SCHEME=gpt
UDISKS_PRESENTATION_NOPOLICY=0
USEC_INITIALIZED=72490

SystemB (SATA as SCSI):

SystemB:~$ sudo udevadm info --query=property --name /dev/sda
DEVLINKS=/dev/disk/by-id/scsi-350014ee261a4fe1f /dev/disk/by-id/wwn-0x50014ee261a4fe1f /dev/disk/by-path/pci-0000:03:00.0-sas-0x500304801eabe304-lun-0
DEVNAME=/dev/sda
DEVPATH=/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0/port-0:0/expander-0:0/port-0:0:1/end_device-0:0:1/target0:0:0/0:0:0:0/block/sda
DEVTYPE=disk
ID_BUS=scsi
ID_MODEL=WDC_WD60EZRX-00M
ID_MODEL_ENC=WDC\x20WD60EZRX-00M
ID_PART_TABLE_TYPE=gpt
ID_PATH=pci-0000:03:00.0-sas-0x500304801eabe304-lun-0
ID_PATH_TAG=pci-0000_03_00_0-sas-0x500304801eabe304-lun-0
ID_REVISION=0A80
ID_SCSI=1
ID_SCSI_SERIAL=WD-WX31D55DF9X0
ID_SERIAL=350014ee261a4fe1f
ID_SERIAL_SHORT=50014ee261a4fe1f
ID_TYPE=disk
ID_VENDOR=ATA
ID_VENDOR_ENC=ATA\x20\x20\x20\x20\x20
ID_WWN=0x50014ee261a4fe1f
ID_WWN_WITH_EXTENSION=0x50014ee261a4fe1f
MAJOR=8
MINOR=0
SUBSYSTEM=block
UDISKS_PARTITION_TABLE=1
UDISKS_PARTITION_TABLE_COUNT=1
UDISKS_PARTITION_TABLE_SCHEME=gpt
UDISKS_PRESENTATION_NOPOLICY=0
USEC_INITIALIZED=80270

$ sudo skdump /dev/sda
Device: sat16:/dev/sda
Type: 16 Byte SCSI ATA SAT Passthru
Size: 5723166 MiB
Awake: Operation not supported
ATA SMART not supported.

UPDATE: The servers have 32 bays connected to a PCI-E SAS3 HBA. I have had previously moved some of the disks from SystemB (the one that exposes SATA as SCSI) to SystemA (the one exposing them as SATA), and well, they are identified as SATA. So I guess we can rule out disks causing this. Note that my aim is that both server identify disks as ATA so that skdump (or udisks) be able to inspect the disks.

# lsscsi -g <- OUTPUT

# lshw <- SystemA SystemB

$ udevadm info -a -p $(udevadm info -q path -n /dev/sdb) SystemA SystemB

$ dmesg SystemA SystemB

SOLVED: The problem turned out to be with firmware of SAS controller being too old. Updating the firmware did the trick:

$ sudo sas3flash -o -f 3008IT14.ROM -b mptsas3.rom

2 Answers2

2

UPDATE

Wow that drive is old, and huge. http://products.wdc.com/library/SpecSheet/ENG/2879-800026.pdf

max transfer rate is 175 MB/s so if the link negotiated at gen1 you would be capped to under 150 MB/s

and.. it looks like the warranty just expired if you bought them in 2014

ENDUPDATE

First off all disks are presented as SCSI on Linux (unless it's some obstinate raid controller that presents a raw block device). This has been true since libsas was introduced and libata was integrated (2008?). What varies is the level of support your drive receives.

Yes your drives are driven by said controller, and those phys are fed into what appears to be a SAS expander according to sysfs. I don't know how many bays you have but either way it looks like there was a negotiation problem during discovery where the expander decided that the drive off this phy can't do pass through.

http://www.sasexpanders.com/faq/

I would suggest getting the SMP utils and send a link reset to the drive and see if that improves matters. You maybe be able to do this via sysfs, see if it exposes a reset or similar file and echo a 1 to it.

http://sg.danny.cz/sg/smp_utils.html

Drive firmware versions appear to be the same...

# GOOD ID_MODEL=WDC_WD60EZRX-00MVLB1
# GOOD ID_REVISION=80.00A80

# BAD ID_MODEL=WDC_WD60EZRX-00M
# BAD ID_REVISION=0A80

Problem with only providing for 4 chars is that my not be "80.." on the bad drive. If you put them both in the same working system it would be much easier to compare. If it comes up fine there then you know it's not the disk. Same thing with the model number...

The SATA spec thinks it's a good idea to only report four characters. Which is a clue. Even though they both are presented as SD devices one is getting more support from libata than the other. The drive that works is getting the full firmware version which means it's adhering to SAT, scsi to ata translation spec. That's what SCSI drives look like. The other one is not.

  • Same operating system and kernel version?
  • Same controller firmware?
  • Same physical expander?
  • Same expander firmware?
  • Does the problem follow the drive if it's plugged into a different port?
  • Does any drive plugged into that port fail to get ATA passthrough?

It doesn't say what link speed the bad one negotiated at. Here's cheat.

ppetraki@:scaleout_demo$ dmesg | grep -i link | grep SATA
[    1.759912] ata6: SATA link down (SStatus 0 SControl 300)
[    1.763905] ata5: SATA link down (SStatus 0 SControl 300)
[    1.927906] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    1.935870] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

and this.

ppetraki@:scaleout_demo$ sudo hdparm -t /dev/sda

/dev/sda:
 Timing buffered disk reads: 762 MB in  3.00 seconds = 253.77 MB/sec

A 3Gb SAS/SATA link transfers about 300 MBps and so on. This is reflected in hdparm -t which is reading as fast as it can from the media. Performing at 253 MB/s means that this drive is performing as expected, or it's under performing. Meaning if you drop in a 6G SSD onto that link, you're not going to get much more than throughput than 253 MB/s.

I say this because the good drive says it negotiated at gen 2, the second one doesn't say. If you run the hdparm test on both drives and the numbers are wildly different, then there could be something wrong with the bad drive that forced it to negotiate at a lower link speed.

SAS and SATA use the same electrical/physical wire format. Only thing that's different is that a SAS drives chirps COMINIT and then COMSAS at power on or reset, as opposed to a SATA drive that only chirps (OOB) COMINIT. Depending on that negotiation the appropriate discovery state machine is activated and so on.

I don't recall which state machine decides where ATA passthrough is viable. Draft specs are out there... and it is thick.

Hope this helps.

ppetraki
  • 322
  • 2
  • 10
  • Thanks for the very thorough answer. I had put some of the disks into the other server with the outcome of them now being exposed as sata. The link speed is in fact higher and `hdparm` reports faster reads for SystemA (SStatus 133 SControl 330) compared to SystemB (SStatus 0 SControl 300). – Nima Mohammadi Aug 14 '17 at 22:16
  • I was inspecting `dmesg` and updating my question when I found out from the logs that the firmware of the SAS HBA card seems to be newer: LSISAS3008 SATA-as-SATA FWVersion(06.00.00.00), SATA-as-SCSI FWVersion(01.00.03.00). I suspect that this may be the problem. I'll update the firmware and report back. – Nima Mohammadi Aug 14 '17 at 22:20
  • 1
    It sounds like you're on the right track. If it we're me at this point, I would focus on the firmware revs of the controller and the expander/ses-2 disk enclosure. You may get lucky with lsscsi, sometimes the enclosure shows up, with the firmware rev. I don't recall another way to get it, been a while. – ppetraki Aug 14 '17 at 23:53
  • 1
    Updating the firmware solved the problem. Thank you :) – Nima Mohammadi Aug 15 '17 at 00:03
  • You did it. Good job! \o/ – ppetraki Aug 15 '17 at 00:06
1

The second SATA disk is probably connected to a SAS port, which enables SATA-on-SAS tunneling.

Be sure to connect the SATA disk to a true SATA port.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • The enclosures holding disks are connected to a [PCI-E HBA SAS3](http://www.supermicro.com/products/accessories/addon/AOC-S3008L-L8e.cfm) card. That goes for both servers. – Nima Mohammadi Aug 13 '17 at 14:07
  • @NimaMohammadi then check while booting the raid mode the hba card is set to, it can change how the OS see the disk – yagmoth555 Aug 13 '17 at 16:57
  • @yagmoth555 One server is remote and I need to setup IPMI to check this. I'll check and report back to you. Thank you. – Nima Mohammadi Aug 14 '17 at 21:08