46

When a Linux box gets an ATA error, it syslogs it with a message identifying the disk as "ata%d.00". How do I translate that to a device name (e.g. /dev/sdb)? I feel like this should be trivial, but I cannot figure it out.

nelhage
  • 561
  • 1
  • 4
  • 4

9 Answers9

31

Peter inspired me to write an advanced script(let), which can even detect USB sticks (instead of outputting silly things like "ata0.00"). In contrary to Peter's script, you will also get the sub-number (as in 4.01) if you have more than one device at the same controller resp. channel. The output will be exactly as you get it in syslog. Tested. Working very well on my Debian box, though there is always lots of improvement (e. g. too clumsy regexps). But HOLD IT! The seemingly too high number of escaped characters you may find in my regexps is just for compatibility reasons! You can't assume GNU sed with everyone, which is why I did without extended regexps on purpose.

UPDATES
(1) Will no longer parse ls output. (oops!) Since you all know: Do not parse ls.
(2) Now also works on read-only environments.
(3) Inspired by a suggestion from this chit-chat here I have managed to again get the sed statements way less complicated.

#!/bin/bash
# note: inspired by Peter
#
# *UPDATE 1* now we're no longer parsing ls output
# *UPDATE 2* now we're using an array instead of the <<< operator, which on its
# part insists on a writable /tmp directory: 
# restricted environments with read-only access often won't allow you that

# save original IFS
OLDIFS="$IFS"

for i in /sys/block/sd*; do 
 readlink $i |
 sed 's^\.\./devices^/sys/devices^ ;
      s^/host[0-9]\{1,2\}/target^ ^ ;
      s^/[0-9]\{1,2\}\(:[0-9]\)\{3\}/block/^ ^' \
 \
  |
  while IFS=' ' read Path HostFull ID
  do

     # OLD line: left in for reasons of readability 
     # IFS=: read HostMain HostMid HostSub <<< "$HostFull"

     # NEW lines: will now also work without a hitch on r/o environments
     IFS=: h=($HostFull)
     HostMain=${h[0]}; HostMid=${h[1]}; HostSub=${h[2]}

     if echo $Path | grep -q '/usb[0-9]*/'; then
       echo "(Device $ID is not an ATA device, but a USB device [e. g. a pen drive])"
     else
       echo $ID: ata$(< "$Path/host$HostMain/scsi_host/host$HostMain/unique_id").$HostMid$HostSub
     fi

  done

done

# restore original IFS
IFS="$OLDIFS"
syntaxerror
  • 410
  • 4
  • 10
  • Just a reminder that script may not show devices that are having issues. I had ata6 erroring with softreset failed (1st FIS failed) (Minor Issues) listed dvices and it was not present. if you know you have 4 disks in the pc and only 3 show up that may be why. – Kendrick May 24 '15 at 03:42
  • 1
    @Kendrick Well, I would not blame the script in this case. For if you know how the kernel drivers work, this is going to be more than clear to you :) Kernel subsystem drivers are known to __give up__ once the "issues" are severe enough. This reads, that for an UDMA-capable drive, it may induce multiple drive resets and (eventually) attempt a drive operation in PIO mode. However, if *this* proves too unstable as well (various timing errors etc.), the driver will say "go away" to the drive. For old PATA drives, this means that a __cold reboot__ will be mandatory for the drive to show up again. – syntaxerror May 24 '15 at 14:09
  • Not my intention to mean blame on the script. just a reminder as to why it may be missing :) stupid flakey seagate controller board made it a pain to figure out what was going on. – Kendrick May 30 '15 at 01:24
  • @Kendrick You're telling me man.:) Well, in my book, Seagate should *never* have bought out Samsung. Loved the latter drives (when Samsung were still in mass storage business), plus their excellent support team. Now Seagate has taken over all this ... and ... uh-oh. – syntaxerror May 30 '15 at 14:02
14

Look at /proc/scsi/scsi, which will look something like this:

$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3250823AS      Rev: 3.03
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3750528AS      Rev: CC44
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3750330AS      Rev: SD1A
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi10 Channel: 00 Id: 00 Lun: 00
  Vendor: WDC WD20 Model: EARS-00MVWB0     Rev:     
  Type:   Direct-Access                    ANSI SCSI revision: 02

scsi0 id 0 is sda and ata1.00, scsi1 id 0 is sdb and ata2.00, etc.

Also look at /var/log/dmesg, which shows the ata driver loading info and will make things a little clearer. Look for the line starting "libata".

Phil Hollenback
  • 14,647
  • 4
  • 34
  • 51
  • 11
    You might also need to use 'lsscsi' - which gives a slightly more human friendly output - e.g. [0:0:0:0] cd/dvd TSSTcorp CDDVDW SH-S202H SB00 /dev/sr0 [2:0:0:0] disk ATA ST3500630AS 3.AA /dev/sda [3:0:0:0] disk ATA WDC WD5000AAKS-0 01.0 /dev/sdb (On this server, running a 3.2.x kernel, there is no /proc/scsi*) (Sorry, I Can't seem to figure out how to get any formatting into the above, to make it readable) – David Goodwin May 21 '12 at 11:12
  • 2
    This should be an answer rather than a comment. Useful, quick and easy to read from one machine and type on another with issues. – Elder Geek Feb 24 '16 at 18:21
13

I prefer scriptlets instead of lenghty explanations. This works on my Ubuntu box. Add comments to your liking:

# on Ubuntu get ata ID for block devices sd*
ls -l /sys/block/sd* \
| sed -e 's^.*-> \.\.^/sys^' \
       -e 's^/host^ ^'        \
       -e 's^/target.*/^ ^'   \
| while read Path HostNum ID
  do
     echo ${ID}: $(cat $Path/host$HostNum/scsi_host/host$HostNum/unique_id)
  done
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
Peter
  • 131
  • 1
  • 3
  • Your script is a bit less scary than the answer, mostly because I can see the whole thing. – isaaclw Jul 01 '14 at 18:31
  • 2
    A little simplifying (works for me on Centos) `ls -l /sys/block/sd* | sed -e 's@.*-> \.\..*/ata@/ata@' -e 's@/host@ @' -e 's@/target.*/@ @'` – Shirker Aug 08 '15 at 17:10
9

This is actually quite tricky. While it's safe to assume that "the scsi ID" is "the SATA ID minus one", I prefer to be really safe and inspect the unique_id which I assume (based on this post) is the SATA identifier.

My error was:

[6407990.328987] ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x280100 action 0x6 frozen
[6407990.336824] ata4.00: irq_stat 0x08000000, interface fatal error
[6407990.343012] ata4: SError: { UnrecovData 10B8B BadCRC }
[6407990.348395] ata4.00: failed command: READ FPDMA QUEUED
[6407990.353819] ata4.00: cmd 60/20:00:28:c2:39/00:00:0c:00:00/40 tag 0 ncq 16384 in
[6407990.353820]          res 40/00:00:28:c2:39/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
[6407990.369618] ata4.00: status: { DRDY }
[6407990.373504] ata4: hard resetting link
[6407995.905574] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[6407995.976946] ata4.00: configured for UDMA/133
[6407995.976961] ata4: EH complete

So my procedure to find out what ata4 is:

  1. find the PCI id of the SATA controller

    # lspci | grep -i sata
    00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09)
    
  2. find the matching unique ID:

    # grep 4 /sys/devices/pci0000:00/0000:00:1f.2/*/*/*/unique_id
    /sys/devices/pci0000:00/0000:00:1f.2/host3/scsi_host/host3/unique_id:4
    
  3. so it's on scsi_host/host3, which we can translate to 3:x:x:x, which we can grep for in dmesg to find out more:

    # dmesg | grep '3:.:.:.'
    [    2.140616] scsi 3:0:0:0: Direct-Access     ATA      ST3250310NS      SN06 PQ: 0 ANSI: 5
    [    2.152477] sd 3:0:0:0: [sdd] 488397168 512-byte logical blocks: (250 GB/232 GiB)
    [    2.152551] sd 3:0:0:0: [sdd] Write Protect is off
    [    2.152554] sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
    [    2.152576] sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    [    2.157004] sd 3:0:0:0: [sdd] Attached SCSI disk
    [    2.186897] sd 3:0:0:0: Attached scsi generic sg3 type 0
    
  4. here's our device, we can (optionally) find the serial number to take that device out of there (or check cabling or whatever) before our RAID array totally fails:

    # hdparm -i /dev/sdd | grep Serial
     Model=ST3250310NS, FwRev=SN06, SerialNo=9SF19GYA
    

And you're done!

anarcat
  • 740
  • 1
  • 9
  • 18
9

Try this:

# find -L /sys/bus/pci/devices/*/ata*/host*/target* -maxdepth 3 -name "sd*" 2>/dev/null | egrep block |egrep --colour '(ata[0-9]*)|(sd.*)'

I never understood the dmesg - some rows are about "ata4" some others about "scsi" or sdc, but no one assigns "ata4 . . . sdc" the command shown finds the /sys/bus/ path, where both ata4 and sdc are specified.

Only if your udev system does create it, you can simply type:

# ls -l /dev/disk/by-path/ 
lrwxrwxrwx 1 root root 2020-06-17 12:01 pci-0000:00:1d.7-usb-0:3:1.0-scsi-0:0:0:0 -> ../../sdc 
lrwxrwxrwx 1 root root 2020-06-17 12:07 pci-0000:00:1f.2-ata-1 -> ../../sda 
lrwxrwxrwx 1 root root 2020-06-17 12:07 pci-0000:00:1f.2-ata-1-part1 -> ../../sda1 
lrwxrwxrwx 1 root root 2020-06-17 12:07 pci-0000:00:1f.2-ata-2 -> ../../sdb

The result contains all low level device and corresponding block device on the same line.

schweik
  • 253
  • 2
  • 8
8

I had the same problem and was able to identify drives by checking dmesg. There you can see the controller identifier (correct term??) and the model of the disk. Then use ls -l /dev/disk/by-id to match the model number to /dev/sda (or whatever). Alternatively, I like Disk Utility for this information. Note: this only works if your disks have different model numbers, otherwise you can't distinguish between the two.

>dmesg |grep ata
...
[   19.178040] ata2.00: ATA-8: WDC WD2500BEVT-00A23T0, 01.01A01, max UDMA/133
[   19.178043] ata2.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[   19.179376] ata2.00: configured for UDMA/133
[   19.264152] ata3.00: ATA-8: WDC WD3200BEVT-00ZCT0, 11.01A11, max UDMA/133
[   19.264154] ata3.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[   19.266767] ata3.00: configured for UDMA/133
...

>ls -l /dev/disk/by-id
lrwxrwxrwx 1 root root  9 Feb 18 12:17 ata-WDC_WD2500BEVT-00A23T0_WD-WXE1A7131446 -> ../../sda
lrwxrwxrwx 1 root root 10 Feb 18 11:48 ata-WDC_WD2500BEVT-00A23T0_WD-WXE1A7131446-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Feb 18 12:17 ata-WDC_WD3200BEVT-00ZCT0_WD-WXHZ08045183 -> ../../sdb
lrwxrwxrwx 1 root root 10 Feb 18 11:48 ata-WDC_WD3200BEVT-00ZCT0_WD-WXHZ08045183-part1 -> ../../sdb1
ecellingsworth
  • 101
  • 2
  • 4
3

The easiest way is to review the kernel log from boot, since the drive device names are mixed in from various sources (eg USB drives), or are assigned based on type of device (ie cdrom may be scdX instead, and everything has a sgX). In practice, unless you have mixed different kinds of buses (eg SATA+USB) the lowest numbered ata device is going to be sda unless it's a cdrom device.

Depending on your system, it might be divined by wandering around sysfs. On my system ls -l /sys/dev/block reveals that 8:0 (major:minor from /dev entry) points to /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda Likewise, ls -l /sys/class/ata_port reveals that ata1 points to /sys/devices/pci0000:00/0000:00:1f.2/ata1/ata_port/ata1 which is on the same PCI sub-device.

Since I use SATA, and only one drive is on each port I can deduce that ata1.00 = sda. All of my drives are .00, I suspect that if I used a port multiplier, my drives would be given .01, .02, .03 etc. Looking at other people's logs PATA controllers use .00 and .01 for master and slave, and based on their logs if you have ataX.01, the .01 should be mapped to the "ID" in the host:channel:ID:LUN folder from the /sys/dev/block/ listing. If you have multiple ataX/ and hostY/ folders in the same PCI device folder, then I suspect that the lowest numbered ataX folder matches the lowest numbered hostY folder.

DerfK
  • 19,313
  • 2
  • 35
  • 51
3

In /sys/class/ata_port/ata${n}/device/, you can see a host${x} folder. E.g., on my machine:

gibby ~ # ls /sys/class/ata_port/ata1/device/
ata_port  host0  link1  power  uevent
gibby ~ # ls /sys/class/ata_port/ata2/device/
ata_port  host1  link2  power  uevent
gibby ~ # lsscsi
[0:0:0:0]    disk    ATA      WDC WD1002FAEX-0 1D05  /dev/sda
[1:0:0:0]    disk    ATA      WDC WD2001FFSX-6 0A81  /dev/sdb
[2:0:0:0]    disk    ATA      WDC WD1002FAEX-0 1D05  /dev/sdc
[3:0:0:0]    disk    ATA      WDC WD2001FFSX-6 0A81  /dev/sdd
[5:0:0:0]    disk    ATA      SAMSUNG MZ7TD256 2L5Q  /dev/sde

The ${x} in host${x} refers to that first number in the [0:0:0:0]. So for me ata1 refers to host0 which can also be represented in SCSI form as 0:*:

gibby ~ # lsscsi 0:\*
[0:0:0:0]    disk    ATA      WDC WD1002FAEX-0 1D05  /dev/sda
binki
  • 161
  • 10
0

The script below will give you a nice summary like this:

sda [  180.0 GB] INTEL SSDSC2BW180A4, BTDA4052066D1802GN pci0000:00/0000:00:11.0/ata1/host0/target0:0:0/0:0:0:0/block/sda
sdb [ 1000.2 GB] WDC WD1000DHTZ-04N21V1, WD-WXM1E83CNTX5 pci0000:00/0000:00:11.0/ata3/host2/target2:0:0/2:0:0:0/block/sdc
sdc [ ------ GB] -- pci0000:00/0000:00:12.2/usb1/1-5/1-5:1.0/host6/target6:0:0/6:0:0:0/block/sdf

So in one line per drive you have sdX device name, size, model, s/n and the pci and ata numbers. The sdc above coresponds to a USB SD card reader with no card inserted. Hence the ---- in place of real information.

#!/bin/bash
BLKDEVS=`ls -l /sys/block/sd*|sed -e 's/^.* -> //' -e 's/^...devices.//'`
echo $BLKDEVS|tr \  \\n |sort| \
while read DISK ; do
    SD=`echo $DISK|sed -e 's/^.*\///'`
    INFO=`hdparm -i /dev/$SD 2>/dev/null|grep Model=|sed -e 's/Model=//' -e 's/FwRev=[^ ]*//' -e 's/SerialNo=//'`
    ! [[ $INFO ]] && INFO='--'
    SIZE=`fdisk -l /dev/$SD 2>/dev/null|grep '^Disk .* bytes'|sed -e 's/^[^,]*, \([0-9]*\) bytes$/\1/'`
    if [[ $SIZE ]] ; then
        SIZE=`echo $SIZE|awk '{printf "[%7.1f GB]" , $1/1000/1000/1000}'|tr \  _`
    else
        SIZE='[ ------ GB]'
    fi
    echo $SD $SIZE $INFO $DISK
done

(only tested on ubuntu 12.04/14.04 and CentOS 6)

ndemou
  • 1,215
  • 2
  • 16
  • 27
  • How does that equate to show you what, for example, ATA 4.01 is? – Edward_178118 Jun 14 '16 at 02:42
  • In the example output you see sda:...ata1... and sdb:...ata3.... And indeed sda was at ata1 and sdb at ata2. Since I wrote it and test it on 4 different hosts I found out HW where the above script doesn't contain a reference to ata. I should point out that dmesg|grep "ata[0-9]" has never failed me. – ndemou Jun 14 '16 at 07:20