13

I'm building out a FreeNAS based server in a Supermicro X6DHE-XB 3U enclosure with 4G of RAM, 16 SATA hot-swap bays. It comes with 2x8 port 3Ware RAID cards, but I'm planning on just using the ZFS capabilities instead of the hardware RAID. My initial drive set will be 8x2TB HITACHI Deskstar 7K3000 HDS723020BLA642 drives.

If I was using hardware based RAID, it would give me a red light on the drive bay where the drive failed. How does it work with ZFS when a drive fails? I don't think there is any guarantee that sda=bay1, sdb=bay2, etc, so how do you determine which drive needs to be replaced? Can ZFS report back to the SATA controller to turn on the "failed drive" light? Does it just report the drive serial number? What if the drive fails so hard it can't report it's serial number? I suppose it is a good idea to write down every drive's serial number and which bay it went into before you go live. Are there any other "pre-production" tasks to make replacing drives easier in the future?

John P
  • 1,659
  • 6
  • 37
  • 56

7 Answers7

9

The current version of FreeNAS (ver 9.3 at the moment) will create a gptid for each drive added to a zpool. Imediately after creation, the "zpool status" will look something like this (depending on your pool configuration)...

# zpool status
pool: myzfstest
state: ONLINE
scan: none
requested config:

    NAME                                            STATE     READ WRITE CKSUM
    myzfstest                                       ONLINE       0     0     0
      raidz-0                                       ONLINE       0     0     0
        gptid/4fc2b789-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/51d38480-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/54c672cc-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/56a07638-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
      raidz2-1                                      ONLINE       0     0     0
        gptid/630e1317-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/6557b52d-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/667a1318-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/68cadf75-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
    logs
      mirror-2                                      ONLINE       0     0     0
        gptid/8839f22e-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
        gptid/8a6d0b14-7b7f-11e4-9585-de9b81338d40  ONLINE       0     0     0
    cache
      gptid/8c2f3824-7b7f-11e4-9585-de9b81338d40    ONLINE       0     0     0
      gptid/8da9ba80-7b7f-11e4-9585-de9b81338d40    ONLINE       0     0     0
    spares
      gptid/72f039f2-7b8a-11e4-9585-de9b81338d40    AVAIL
      gptid/750df91d-7b8a-11e4-9585-de9b81338d40    AVAIL

errors: No known data errors

Unfortunately, the web GUI doesn't show you these numbers. So, if you get an error saying that "gptid/6557b52d-7b7f-11e4-9585-de9b81338d40" is bad... how do you know which drive to pull? Figuring that part out requires some legwork at the time of install.

  1. When you build your system. Write down the serial number of every drive and also write down the location of where that drive was inserted. On a double sided JBOD case for instance, you may want to note front/back, row, & column.
  2. When you boot up FreeNAS, in the web GUI, go to "storage>volumes/view disks". On that tab you should have a list of all your drives and their serial numbers. Note the drive name given for each serial number you had in the previous list. If you don't see the serial numbers, you will have to drop to the shell and type smartctl -a /dev/ada0 | grep ^Serial (replacing "/dev/ada0" with each of the drive names from the list)
  3. Now, at the shell, we need to match up the drive names with all the gptid numbers. So, type glabel status and you should get something like this...

    # glabel status
    
    CORRECT>glabel status (y|n|e|a)? yes    
                                          Name  Status  Components  
                                 ufs/FreeNASs3     N/A  ada0s3  
                                 ufs/FreeNASs4     N/A  ada0s4  
                                ufs/FreeNASs1a     N/A  ada0s1a
    gptid/616cddb6-7b7f-11e4-9585-de9b81338d40     N/A  ada0p2  
    gptid/630e1317-7b7f-11e4-9585-de9b81338d40     N/A  da1p1   
    gptid/6557b52d-7b7f-11e4-9585-de9b81338d40     N/A  da2p1   
    gptid/667a1318-7b7f-11e4-9585-de9b81338d40     N/A  da3p1   
    gptid/68cadf75-7b7f-11e4-9585-de9b81338d40     N/A  da4p1   
    
  4. Now write in all the gptid numbers to associate them with the drive names and thus the serial numbers and their locations. Note: when you see something like "da3p1" that's partition one of the drive identified as da3. The list in the web GUI will only show the label "da3" for the disk.

Now, when an error comes up saying a disk with gptid number xyz has an error, you'll be able to reference your sheet and know which drive you need to pull/replace.

I know this is beyond late for the original poster; but, perhaps others will find this useful.

James McMahon
  • 763
  • 2
  • 8
  • 16
Craig
  • 313
  • 3
  • 9
  • 1
    For the original question, "glabel status" is the critical portion. That will allow you to figure out the mapping between the wacky IDs and the physical. – Brian Knoblauch Mar 02 '15 at 15:21
  • Wow. Great answer, but it's a bit disappointing ZFS doesn't have a half-decent way to keep track of disks. – mikato Sep 29 '17 at 21:27
6

What you need is the sas2ircu utility from LSI (now Avago). LSI maintains versions for FreeBSD, Linux and Windwos. With FreeNAS you will need the FreeBSD version.

To try it you would put it in the /tmp directory and make it executable first.

Step one is discover the ID of your SAS HBA (example):

/tmp# ./sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.


         Adapter      Vendor  Device                       SubSys  SubSys
 Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
 -----  ------------  ------  ------  -----------------    ------  ------
   0     SAS2008     1000h    72h   00h:04h:00h:00h      1000h   3020h
SAS2IRCU: Utility Completed Successfully.

Step two would be generate a list of all your devices you can examine later:

/tmp# ./sas2ircu 0 display > disklist.txt

Step 3 is examining your disk list. It will look similarly to:

/tmp# vi disklist.txt
LSI Corporation SAS2 IR Configuration Utility.
Version 19.00.00.00 (2014.03.17)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.

Read configuration has been initiated for controller 0
------------------------------------------------------------------------
Controller information
------------------------------------------------------------------------
  Controller type                         : SAS2008
  BIOS version                            : 7.37.00.00
  Firmware version                        : 19.00.00.00
  Channel description                     : 1 Serial Attached SCSI
  Initiator ID                            : 0
  Maximum physical devices                : 255
  Concurrent commands supported           : 3432
  Slot                                    : 4
  Segment                                 : 0
  Bus                                     : 4
  Device                                  : 0
  Function                                : 0
  RAID Support                            : No
------------------------------------------------------------------------
IR Volume information
------------------------------------------------------------------------
------------------------------------------------------------------------
Physical device information
------------------------------------------------------------------------
Initiator at ID #0

Device is a Enclosure services device
  Enclosure #                             : 2
  Slot #                                  : 24
  SAS Address                             : 5003048-0-00d3-a87d
  State                                   : Standby (SBY)
  Manufacturer                            : LSI CORP
  Model Number                            : SAS2X36
  Firmware Revision                       : 0717
  Serial No                               : x36557230
  GUID                                    : N/A
  Drive Type                              : Undetermined

Device is a Enclosure services device
  Enclosure #                             : 3
  Slot #                                  : 0
  SAS Address                             : 5003048-0-00ca-7bfd
  State                                   : Standby (SBY)
  Manufacturer                            : LSI CORP
  Model Number                            : SAS2X28
  Firmware Revision                       : 0717
  Serial No                               : x36557230
  GUID                                    : N/A
  Drive Type                              : Undetermined

Device is a Hard disk
  Enclosure #                             : 4
  Slot #                                  : 0
  SAS Address                             : 5003048-0-00d3-a8cc
  State                                   : Ready (RDY)
  Size (in MB)/(in sectors)               : 1907729/3907029167
  Manufacturer                            : ATA
  Model Number                            : WDC WD20EARS-00M
  Firmware Revision                       : AB51
  Serial No                               : WDWCAZA1037887
  GUID                                    : N/A
  Drive Type                              : Undetermined

Device is a Hard disk
  Enclosure #                             : 4
  Slot #                                  : 1

Step 4 is identifying your failed drive - you will know which by the missing or damaged information reported on the drive. Get the Enclosure # and The Slot # and use them to blink the tray LED in step 5 : To locate Enclosure # 4, Slot # 0

 /tmp# ./sas2ircu 0 locate 4:1 ON

To turn the LED off after replacing:

/tmp# ./sas2ircu 0 locate 4:1 OFF

I hope this helps!

Dimitar Boyn
  • 141
  • 2
  • 4
4

zpool status -v should tell you which disk is online or not.

Marcin
  • 2,281
  • 1
  • 16
  • 14
  • 3
    +1 FreeNAS is FreeBSD based, and the drives will be in the order the card provides. If there is a single 8 port SAS controller, the drives will be /dev/da0 through /dev/da7, with the same numbering as the card (good cables are also labeled per drive). If you have multiple controllers, or anything complicated you can run `camcontrol devlist` to get a listing of all SAS/SCSI drives and what card, target, lun they are on. – Chris S Apr 20 '11 at 17:29
  • 1
    Chris S is incorrect. The drives do not always appear in the order the card provides. For example, our "da7" appears second in the list of 8 drives... Also, the zpool status merely gives the labels and not the actual disks. – Brian Knoblauch Mar 02 '15 at 15:18
2

Look at the Volumes.

Select the Volume that is Degraded.

At the bottom of your screen there are three selections... click Volume Status

You will now see a closeup of the volume and its individual hard drives listed something like ada3p2, ada5p2, ada6p2, ada4p2 etc.

Select the Degraded Drive.

At the bottom of your screen you will see two options; Edit Disk and Replace

Select Edit Disk

You should now see the Serial number of the degraded disk.

Power down your FreeNAS server and look for that disk.

wri7913
  • 21
  • 1
  • This should be the correct answer, When I did this I found a full list of all serials attached, therefore the one not attached must be the faulty one! Thanks so much @wri7913 – Delta_zulu Dec 16 '17 at 09:33
2

This assumes you have a case that has individual HD lights (aka server case)

Find the listing for the drive that's bad. Example /dev/da9, /dev/sda...etc

Offline that disk using the GUI or FreeNAS terminal commands.

Execute DD to read that disk to /dev/null while you look at the front of the server for the light that is now blinking madly.

sudo dd if=/dev/da# of=/dev/null

Note the location of the disk, cancel the DD command (ctrl-c), and then go about your replacement method. For freeNAS, load the new disk up then click the GUI Replace button and finish that process. When done, remove the bad drive and do whatever you want with it. Test it more, Secure Erase it, physically destroy it, send it off for warranty repair.....etc.

Jenny D
  • 27,358
  • 21
  • 74
  • 110
Easyanswer
  • 21
  • 1
1

For some reason the view disk tab was blank in the FreeNas GUI

I was able to identify the drive in the shell using: zpool status -x

   NAME                                            STATE     READ WRITE CKSUM                                                  
    bucket                                          ONLINE       0     0     0                                                  
      raidz1-0                                      ONLINE       0     0     0                                                  
        ada0p2                                      ONLINE       0     0     0                                                  
        ada1p2                                      ONLINE       0     0     0                                                  
        gptid/721fd78d-3492-11e5-b554-50e549c02e6d  ONLINE       0     0     0                                                  
        ada3p2                                      ONLINE       0     0     0                                                  
        ada4p2                                      ONLINE       0     0 2.31K   

Followed by:

smartctl -a /dev/ada4 | grep 'Serial Number'

Serial Number: WD-WCAZA7748584

Jon M
  • 11
  • 1
0

easiest way I found.

click storage click view drives.

pull one sata cable off. print label with the missing drive from view disk aka ada1 stick label to side of drive.

reconnect drive. pull second sata cable off print label ada2 etc

then when a drive fails you know its ada2