55

Hot swapping out a failed SATA /dev/sda drive worked fine, but when I went to swap in a new drive, it wasn't recognized:

[root@fs-2 ~]# tail -18 /var/log/messages
May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake }
May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset
May 5 16:54:45 fs-2 kernel: ata1: soft resetting link
May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:54:55 fs-2 kernel: ata1: soft resetting link
May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:05 fs-2 kernel: ata1: soft resetting link
May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps
May 5 16:55:40 fs-2 kernel: ata1: soft resetting link
May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up
May 5 16:55:45 fs-2 kernel: ata1: EH complete

I tried a couple things to make the server find the new /dev/sda, such as rescan-scsi-bus.sh but they didn't work:

[root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan
-bash: echo: write error: Invalid argument
[root@fs-2 ~]#
[root@fs-2 ~]# /root/rescan-scsi-bus.sh -l
[snip]
0 new device(s) found.
0 device(s) removed.
[root@fs-2 ~]#
[root@fs-2 ~]# ls /dev/sda
ls: /dev/sda: No such file or directory

I ended up rebooting the server. /dev/sda was recognized, I fixed the software RAID, and everything is fine now. But for next time, how can I make Linux recognize a new SATA drive I have hot swapped in without rebooting?

The operating system in question is RHEL5.3:

[root@fs-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

The hard drive is a Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB, model ST3500320NS.

Here is the lscpi output:

[root@fs-2 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)

Update: In perhaps a dozen cases, we've been forced to reboot servers because hot swap hasn't "just worked." Thanks for the answers to look more into the SATA controller. I've included the lspci output for the problematic system above (hostname: fs-2). I could still use some help understanding what exactly isn't supported hardware-wise in terms of hot swap for that system. Please let me know what other output besides lspci might be useful.

The good news is that hot swap "just worked" today on one of our servers (hostname: www-1), which is very rare for us. Here is the lspci output:

[root@www-1 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)
rzr
  • 259
  • 2
  • 6
Philip Durbin
  • 1,541
  • 2
  • 15
  • 24
  • I'd check the status of your SATA controller in the version of the Linux kernel you're using. It could be a bug or plain not supported – Nathan May 06 '09 at 14:51
  • Was 0 the BUS number, or 1? –  May 06 '09 at 15:02
  • 4
    It was bus 0. /sys/class/scsi_host contains host0 through host5. dmesg shows ata1 through ata6. ata1 corresponds to host0, ata2 corresponds to host1, etc. – Philip Durbin Jul 15 '09 at 20:16

12 Answers12

55

If your SATA controller supports hot swap, it should "just work(tm)."

To force a rescan on a SCSI BUS (each SATA port shows as a SCSI BUS) and find new drives, you will use:

echo "0 0 0" >/sys/class/scsi_host/host<n>/scan

On the above, < n > is the BUS number.

  • 1
    Sorry, no joy; running that command only triggers the same automatic rescan that gets triggered when I initially plug in the drive. Thanks, though! – hakamadare Jul 15 '09 at 20:04
  • It's quite possible some config needs to be done on the raid controller for it to see the disk. In my case it was the necessary to add the new disk back into the raid. – MikeKulls Jul 08 '15 at 07:18
  • 1
    I get permission denied even when using `sudo` and when switching to the root user. – Aaron Franke Nov 07 '18 at 22:24
  • On my system that booted from an NVMe SSD, this worked to detect a newly-plugged SATA HD. I use `powertop` to let more things power down, so maybe the SATA port that I plugged the drive into was fully asleep. (The system has a SATA optical drive connected and detected at boot, but it was probably asleep, too.) As others suggest, to avoid resetting the SATA link for active drives, figure out which `host` ids are already in use and don't `scan` those, only the one where you plugged in a new drive. (Or any unused one if you don't know the numbering.) – Peter Cordes Jun 10 '19 at 15:17
  • @AaronFranke: did you use `sudo` to make `echo` root, or did you use it to make the shell root? – TSJNachos117 Apr 24 '20 at 18:56
  • 1
    Here's a slightly better solution, IMHO: `echo "0 0 0" | sudo dd of=/sys/class/scsi_host/host/scan`. I like this option because your shell doesn't need to be root. Also, `dd` could be replaced with `tee` if desired. – TSJNachos117 Apr 24 '20 at 18:59
  • 1
    making `echo` root won't do anything, since it's the shell that opens files when directing output. A similar solution to @TSJNachos117's also-valid suggestion is: `echo 0 0 0 | sudo tee /sys/class/scsi_host/host$n/scan`; it's slightly fewer keystrokes. (Note that you don't actually need the quotes around the zeroes, either; the default IFS interpolation for multiple arguments to echo is a-single-space.) – JamesTheAwesomeDude Jul 01 '20 at 17:24
  • @JamesTheAwesomeDude FWIW, I already knew that "making `echo` root won't do anything", but that's why I asked: I figured that Aaron Franke might be making that exact mistake. – TSJNachos117 Jul 19 '20 at 04:55
23
echo "- - -" >/sys/class/scsi_host/host<n>/scan
       ^ ^
        \_\_______ note spaces between the dashes.
  • 10
    Be careful with this: dmesg showed that it hard-reset all of my SATA links. Possibly worth testing before running it in production and losing tons of writes. – Ivan Kozik Dec 06 '15 at 05:31
16

When a drive has failed in some circumstances Linux won't realise you've actually pulled it physically from the array. If you have that problem (as I did this morning) you can do the following:

echo 1 > /sys/block/<devnode>/device/delete

e.g., in my case, /dev/sda had failed and I didn't want to reboot the server, so I did:

echo 1 > /sys/block/sda/device/delete

After I did that, the new drive (which had actually been physically added already) was immediately visible.

If it is not visible at this point, you can also do this to force a re-scan:

echo "- - -" > /sys/class/scsi_host/host<n>/scan

That "- - -" is wildcards for channel, id & LUN respectively, so you can restrict the scan to some subset if you want by specifying numbers instead.

Before you start, you could also:

readlink /sys/block/<devnode>

Which will show you the path with the right host number to check in /proc/scsi/scsi for disappearence after removal.

karora
  • 111
  • 1
  • 4
11

I can't believe nobody mentioned AHCI yet... your SATA controller has to be in AHCI mode to enable hot swap. Check this by looking at the driver you are using:

root@peter:~ # find /sys -name sdk
/sys/devices/pci0000:00/0000:00:11.0/ata5/host4/target4:0:0/4:0:0:0/block    /sdk
/sys/block/sdk
/sys/class/block/sdk

root@peter:~ # readlink /sys/devices/pci0000:00/0000:00:11.0/driver
../../../bus/pci/drivers/ahci

root@peter:~ # lspci -k | less
[... big long output... search for ahci or your pci address, or use the awk below ...]

root@peter:~ # lspci -k | awk '$1 == "00:11.0" {x=1}; x && /in use/ {print $0; exit}'
    Kernel driver in use: ahci

See how it says "ahci" there.

If it doesn't, then just enable it in your BIOS. Also, some BIOSses, especially on servers or UEFI have a "Hot Swap = enabled/disabled" setting per disk which you should also enable if it exists.

Peter
  • 2,546
  • 1
  • 18
  • 25
8

How about this (seems to work in Ubuntu):

sudo partprobe

2

Here's why I needed to reboot the computer...

I just hot-swapped my /dev/sdc. I have used scsiadd -r 3 0 0 to power the old disk off before pulling it out. Then after installing the new disk the new disk didn't appear as /dev/sdc but rather as /dev/sdd. After a reboot, the disk would reappear as /dev/sdc again.

So it seems hotswap works Ok, it may be just that the /dev/sd* isn't the same anymore.

Could this be an answer to your problem?

Peter
  • 21
  • 1
  • Hmmm, well, rescan-scsi-bus.sh works on /proc/scsi/scsi already, just like scsiadd seems to. We're trying a different server vendor anyway so maybe hot swap will "just work" for us in the future. – Philip Durbin Mar 04 '11 at 14:32
  • 6
    Yeah, you cannot get around that, near as I can tell. This is why you use disk label or UUID, and mount your fs by that (manually, or in fstab), you can set it, and then it doesn't change. The only trick is getting your boot loader to install to the new drive, but still work when it reboots, though from some quick experiments with GRUB (I was replacing sda on a machine with sd[a,b,c,d] and software raid1 for all the system part of the fs). – Ronald Pottol Aug 25 '11 at 04:51
  • 2
    You should never use the /dev/sd* devices in config files such as fstab. You should never assume the names are always the same. Instead, you should use the UUID=... syntax (without quotes), such as you see in man fstab. To find out the UUID, use the blkid command. (alternatively, you could prefer the label or id; also see /dev/disk/by-*) – Peter Mar 05 '16 at 08:18
2

In some cases hot-swap may need to be enabled on the BIOS of either the motherboard and/or the SATA controller. This completely depends on the make and model of both, but if you have on-board SATA controllers that should support hotswap then it's worth combing through the motherboard BIOS. SATA cards may or may not have their own BIOS settings, many lower-end cards don't, but server-grade cards typically do.

If I recall correctly I've needed to this with a number of Gigabyte motherboards, and perhaps some other makes. I needed it for a hot-swap SATA tray to work; with the feature disabled removing the drive didn't cause issues but a new drive wouldn't register until reboot. Enabling the setting worked as-expected, drives that were placed in the tray were immediately spun up and available to the OS.

STW
  • 960
  • 1
  • 7
  • 24
  • Just checked a machine in-house that I know had this; it's running a Gigabyte Z77X-UD3H motherboard with on-board Marvell 88SE9172 and Intel 7 Series/C210 controllers – STW Dec 30 '14 at 19:00
1

For hotplug to work you must have the acpiphp module loaded.

[root@example ~]# modprobe acpiphp

obviously if you want this to work on boot, you will have to configure that to be loaded at boot time - one way is to create / edit /etc/rc.modules (which is called by rc.sysinit) and add the line :

modprobe acpiphp

remember if you create this file to chmod +x it, as it's called in that manner.

Frankie
  • 419
  • 1
  • 6
  • 19
nox
  • 19
  • 1
  • Interesting. I had never heard of acpiphp. Thanks. It seems to stand for Advanced Configuration and Power Interface PCI Hot Plug. PCI is Peripheral Component Interconnect, of course. – Philip Durbin Feb 26 '12 at 04:04
  • 3
    acpiphp is for PCI hotplug, i.e., hot adding and removing *PCI cards.* Some expensive systems support this. And also many hypervisors. – derobert Nov 21 '12 at 16:25
1

My DVD on my Fedora 16 machine is connected to a SATA interface. It was locked up and would not open or close. Running partprobe as root got my cdrom/DVD working again. I reckon it will help on anther machine where I have the occasional hot swap problem. Thanks!

1

The Fusion-MPT SAS controller you have is a low end RAID controller. If you're not using it for RAID, it may still be providing an unhelpful layer of obstruction/abstraction.

You may need to poke at the RAID controller with mpt-status or lsiutil to get it to actually scan the bus.

http://hwraid.le-vert.net/wiki/LSIFusionMPT has a nice amount of documentation, but I can't say I've verified it.

aij
  • 183
  • 6
1

I know this question is old, but I had some success I did not see reported elsewhere. Had similar trouble on a Dell Precision 380 today. Eventually got it to work by doing some combination of the following:

echo "- - -" > /sys/class/scsi_host/host2/scan
echo 1 > /sys/class/scsi_device/2:0:0:0/device/reset
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/rescan
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/reset

WARNING: This may disrupt other ATA devices on the system as well. If you have mounted filesystems on those devices, that is likely to end badly. My situation did not care, but yours might.

Exactly which of the above commands are needed, and in what order, is unknown to me at this time. Some commands may need to be repeated. If I had to guess, I would say do in the order shown above, then another scsi_host scan again at the end. I did quite a few more in my explorations.

The first command (scsi_host scan) tells the SCSI midlayer to scan all buses for new/changed devices. The second command tries to reset the SCSI target (disk device). The last two are working with the driver for the AHCI controller itself.

I found the items in question mostly by detailed examination and bold experimentation.

You can match scsi_device nodes to device make and model with (using grep to print the file names in front of the contents):

grep . /sys/class/scsi_device/*/device/model

The first digit of the SCSI device ID should be the scsi_host number. You can then match scsi_host nodes to their devices nodes with:

ls -l /sys/class/scsi_host

I suspect I will never get a chance to refine further, so I wanted to share this info in the hopes of getting others closer. If I do get more info, I will edit this answer to reflect.

Hope this helps.

Ben Scott
  • 360
  • 1
  • 7
  • This comment pointed me to the right direction. Used the grep line to discover my bus id, and then echo "- - -" > /sys/class/scsi_host/host/scan to finally being able to find my new hd – Fernando Crespo Mar 22 '22 at 19:05
0

Just picked up a new system with hot-swap. I had drives in two of the three slots, but neither was found using any technique I could find. The solution ended up being: enter setup -> Advanced -> SATA Configuration -> SATA Enable = Enabled. Previously this had been set to Auto.