0

I'm tearing my hair out over this issue.

I wanted to add a hotswap bay to my homeserver to easily add and remove HDD such as to easily rotate off-site backups. The mainboard in question is an Asrock J4105-ITX motherboard with four native SATA ports, which are divided between an ASM1062 and an Intel processor SATA controller. Both work fine and use the ahci kernel module. There is a hot-swap option in the BIOS but it seems to have no effect.

If a drive is disconnected (either via echo 1 > /sys/block/sdX/device/delete or by rudely removing the drive), no new device will be recognized after reconnecting. I've tried forcing a rescan (echo "- - -" > /sys/class/scsi_host/host<n>/scan) but to no avail, the SATA port is practically not usable anymore until the next reboot. I also tried some more extreme commands without any luck:

echo 1 > /sys/class/scsi_device/2:0:0:0/device/reset
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/rescan
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/reset

(taken from How do I make Linux recognize a new SATA /dev/sda drive I hot swapped in without rebooting?)

"Alright, probably the chipset does not support hot swap or the BIOS is messed up." So I ordered two PCIe SATA Controller (one uses an ASM1064, the other uses the Marvell 88SE9215). Both exhibit the same issue, although other buyers state that hot-swap works for them, so I guess the problem is either tied to software (my installation? I'm running an Arch OS, which is kept dutifully up to date).

Some hopefully useful information:

$ uname -a
Linux servername 5.14.14-arch1-1 #1 SMP PREEMPT Wed, 20 Oct 2021 21:35:18 +0000 x86_64 GNU/Linux

$ dmesg | grep ahci
[    0.447450] ahci 0000:00:12.0: version 3.0
[    0.447842] ahci 0000:00:12.0: SSS flag set, parallel bus scan disabled
[    0.457970] ahci 0000:00:12.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    0.457981] ahci 0000:00:12.0: flags: 64bit ncq sntf stag pm clo only pmp pio slum part sxs deso sadm sds apst 
[    0.458750] scsi host0: ahci
[    0.459204] scsi host1: ahci
[    0.469788] ahci 0000:01:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf impl SATA mode
[    0.469801] ahci 0000:01:00.0: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs 
[    0.470767] scsi host2: ahci
[    0.471203] scsi host3: ahci
[    0.471562] scsi host4: ahci
[    0.471904] scsi host5: ahci
[    0.472341] ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
[    0.472376] ahci 0000:04:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[    0.472382] ahci 0000:04:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc 
[    0.472803] scsi host6: ahci
[    0.473011] scsi host7: ahci

$ lspci -v
[...]
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11) (prog-if 01 [AHCI 1.0])
    Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
    Flags: bus master, fast devsel, latency 0, IRQ 127
    I/O ports at e050 [size=8]
    I/O ports at e040 [size=4]
    I/O ports at e030 [size=8]
    I/O ports at e020 [size=4]
    I/O ports at e000 [size=32]
    Memory at a1340000 (32-bit, non-prefetchable) [size=2K]
    Expansion ROM at a1300000 [disabled] [size=256K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [70] Express Legacy Endpoint, MSI 00
    Capabilities: [e0] SATA HBA v0.0
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: ahci
[...]
Gnarflord
  • 1
  • 1

1 Answers1

0

I finally found the reason: My powertop-tuning was too aggressive!

Because this server is running 24/7 and electricity is kinda expensive around here I added a systemd service to automatically tune all powertop options:

$ cat /etc/systemd/system/powertop.service
[Unit]
Description=Powertop tunings

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/powertop --auto-tune

[Install]
WantedBy=multi-user.target

This is the same as opening the powertop tui and setting all options to 'Good'. The crucial bit are four lines about Runtime PM for port ataX:

   Good          Runtime PM for port ata3 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
   Bad           Runtime PM for port ata4 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
   Good          Runtime PM for port ata5 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
>> Good          Runtime PM for port ata6 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
   Good          Runtime PM for PCI Device Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller

They execute echo 'auto' > '/sys/bus/pci/devices/0000:01:00.0/ata4/power/control'; which aparently causes the SATA card to never recognize new devices on the port!

Only after setting power/control to on (the 'Bad' option according to powertop) will the card find new devices after executing echo 0 0 0 | sudo tee /sys/class/scsi_host/host*/scan

The only thing I'm missing is automatic rescans as my desktop PC will auto-find new devices without the need to write to hostX/scan, but I can kinda live with this for now. This has been an extremely frustrating experience so I hope this might help somebody facing the same issue.

Gnarflord
  • 1
  • 1