3

I have a server with software RAID 1, two hot-swap sata disks. One hard drive started showing errors, I'm thinking about removing and replacing it, only problem is that I have no idea which of the two correspond to which devices. And I can't shut the server down to find out.

I have /dev/sda and /dev/sdb, /dev/sda is the failing one. Thought about doing something along the lines

# mdadm --manage /dev/md0 --remove /dev/sda1

then somehow stop/suspend the drive using tuning software and try to listen which of the two stopped, but that's not gonna work in a noisy server environment. Drive panels have no LEDs.

Thanks for any ideas!

squillman
  • 37,618
  • 10
  • 90
  • 145
Karolis T.
  • 2,709
  • 7
  • 32
  • 45

4 Answers4

3

Can you see S/N on disks? Use hdparm -i /dev/sda to get S/N and identify disk.

igustin
  • 365
  • 2
  • 6
  • Yes, I can. smartctl -i /dev/sda shows serial number, same for sdb. Not sure if this helps though, as they're covered from above in their hot-swap slots. – Karolis T. Sep 10 '09 at 17:09
1

The A and B in sda and sdb should map to channels 1 and 2 (or 0 and 1) for your drives. If the system is set up so that they're labeled, you can tell that way. I don't know how your drives are structured with the wiring; I've had them numbered with small print on the motherboard so you can tell what port is going to what drive.

I supposed you could use your idea to then try feeling for vibration from the drives too, if there's enough room for you to feel the drives. Again depends on the way they're mounted.

Bart Silverstrim
  • 31,092
  • 9
  • 65
  • 87
1

An easy way to check which drive is which, if you have proper drive LEDs, is to just

dd if=/dev/sda of=/dev/null

And see which one has a light that is solidly stuck on.

joshk0
  • 465
  • 2
  • 5
0

Well, last year I wrote a script which translates that ataX.YY stuff to a device name, found here:
Linux ATA errors: Translating to a device name?

However, my personal version of this script has gotten major enhancements since then (will now even show the controller which the HDD is connected to, for instance), so it was just a one-minute job for me to cut it down to your special purposes:

#!/bin/bash
#
# LICENSE: GPL

function float_eval()
{
    local st=0
    local r=0.0
    if [[ $# -gt 0 ]]; then
       r=$(echo "scale=5; $*" | bc -q 2>/dev/null)
       st=$?
       if [[ $st -eq 0  &&  -z "$r" ]]; then st=1; fi
    fi
echo $r
return $st
}

_heahdcnt=0
_badhdcnt=0
_usbcnt=0

echo -e "\nRetrieving assignments from /sys/block ..."
while read Path ID
do
   sizeBlk=$(< /sys/block/$ID/size)

   if grep -q '/usb[0-9]*/' <<< $Path; then
     echo -ne "\n\n(Device /dev/$ID is not an ATA device, but a USB device [e. g. a pen drive])"
      ((_usbcnt++))
   else

     if [ ! -f /sys/block/$ID/device/model ]; then
        echo -e "Error: Couldn't determine model of /dev/$ID\!\n"
     else 
        echo -ne "\n\n/dev/$ID is a $(< /sys/block/$ID/device/model)"

       # when we get a 0, something went wrong; so in this case, skip any calculations
       if [ $sizeBlk -gt 0 ]; then

         sizegib=$((sizeBlk >> 21))

        # nb: since current bc cannot do bit shift operations without external modules 
        # loaded at runtime, we will resort to a temp variable which contains the
        # shifted value

         sizeBlkLsh9=$((sizeBlk << 9))
         sizegb=$(float_eval "$sizeBlkLsh9 / 1000000000")

         # use formatted output, don't mix literals and arithmetic in one string (as with echo)

         LC_NUMERIC=C printf " (%4.0f GiB / %4.0f GB )" $sizegib $sizegb

         ((_heahdcnt++))
       else
         ((_badhdcnt++))
       fi
    fi

    [[ $sizeBlk -eq 0 ]] && echo "WARNING: There appears to be some trouble with device \
 /dev/$ID. You should check this more thoroughly."
  fi

# process substitution
done < <(ls -l /sys/block/sd* \
\
| sed -e 's^.*-> \.\.^/sys^' \
      -e 's^/host[0-9]\{1,2\}/target\([0-9]\{1,2\}\(:[0-9]\)\{2,3\}/\)\{1,2\}block/^ ^')

echo -e "\n\nScanning of hardware completed.\n"

echo "You have $[$_heahdcnt + $_badhdcnt + $_usbcnt] devices connected:"
echo -n "$_heahdcnt healthy HDD(s), $_badhdcnt bad HDD(s)"
[[ $_usbcnt -gt 0 ]] && echo " and $_usbcnt USB device(s)."

NOTE: The float_eval() auxiliary function, albeit not absolutely necessary, can avoid erroneous calculations in billions or trillions of bytes (GB resp. TB, not to be confused with GiB/TiB). Especially in TB range, such calculations may deviate more and more from their accurate values when calculated from block size in (long) integer. The main reason (or cause) is that we have never used a decimal point with HDD capacities before hitting the 1 TB mark in HDD capacities some years ago, so integer calculations may no longer be appropriate in all cases.

Besides, I would be interested in someone improving this script so that it shows serial numbers when there are two drives with identical manufacturer ID. Unfortunately, I haven't been successful in finding this information in /sys/block/* so far.

syntaxerror
  • 410
  • 4
  • 10