What does a hard disk failure in a ZFS pool look like, and what do you actually do?

Question

People often talk about the theoretical benefits of ZFS and how it takes (RAIDZ1/2) hard disk failures easily, and Server Fault has many testaments to this fact. I am considering setting up a NAS with 3-5 hard drives using FreeNAS, and I might be backing up important documents nightly, so I can't take more than a week of downtime.

How does (physically) a hard drive fail?

What does ZFS do, and FreeNAS in particular, when a hard drive in a zpool fails? As in does it SMTP-send you an email saying "replace hard drive 1 and click okay ... when finished."

How long does it take for FreeNAS to recover from a 2-disk failure in RAIDZ2?

How likely am I going to succeed in recovering from a worst-tolerable-case hard drive failure in a RAIDZ2 setup assuming minimal human-computer interaction?

Can a layperson perform the restoration graphically from either an SO-quality manual or a wizard?

Most of your questions are actually not about ZFS itself, but about the FreeNAS feature set regarding drive replacement and notification options. I am editing your question to remove the generic tags as they are not applicable here. — the-wabbit, Aug 10 '14 at 08:20
RAID recovery time depends on various things - but most significantly the size of the disks... It's a lot faster to recover when you're using 500 GB disks than when using 4 TB disks. — Cry Havok, Aug 10 '14 at 09:40

score 8 · Answer 1 · edited Apr 13 '17 at 12:14

FreeNAS supports S.M.A.R.T monitoring so typically before a drives fails if notifications are set correctly and monitoring is enabled sysadmin will be getting reports on bad unusable sectors, overheating, etc. FreeNAS as of version 9.2.1.8 DOESNOT support "hot spare". Spares configured in a zpool can be manually pushed to replace a failed drive but nothing in the software provides for automation of the process. With 2 simultaneous failures in RAIDZ2 there will be almost guaranteed unrecoverable file errors. This is because of a process known as Bitrot. Contemporary drives are typically 3TB+. In order to get better than mirror space utilization one would construct RAIDZ2 from at least 6 Drives. Now with one failed drive and vdev capacity greater than 12 TB in the remaining RAID 5 like stripe and an URE rate of 10^14, you are highly likely to encounter an URE. Almost certain, if the drive vendors are right. Which will result as minimum in a message like this:

~# zpool status -v
  pool: dpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
scan: resilvered 6.90T in 52h5m with 313 errors on Wed Oct 22 17:44:25 2014
config:

        NAME                         STATE     READ WRITE CKSUM
        dpool                        DEGRADED     0     0 5.75K
          raidz2-0                   ONLINE       0     0    78
            c0t50014EE05807CC4Ed0    ONLINE       0     0     0
            c0t50014EE6AAD9F57Fd0    ONLINE       0     0     0
            c0t50014EE204FC5087d0    ONLINE       0     0     0
            c0t50014EE6AADA3B7Cd0    ONLINE       0     0     0
            c0t50014EE655849876d0    ONLINE       0     0     0
            c0t50014EE6AADA3DFDd0    ONLINE       0     0     0
            c0t50014EE6AADA38FFd0    ONLINE      39     0     0
          raidz2-1                   ONLINE       0     0 11.4K
            c0t50014EE6AADA45E4d0    ONLINE   1.69K     0     0
            c0t50014EE6AADA45ECd0    ONLINE     726     0     0
            c0t50014EE6AADA3944d0    ONLINE       0     0     0
            c0t50014EE204FC1F46d0    ONLINE       0     0     0
            c0t50014EE6002A74CEd0    ONLINE       0     0     0
            c0t50014EE2AFA6C8B4d0    ONLINE       0     0     0
            c0t50014EE6002F9C53d0    ONLINE       5     0     0
          raidz2-2                   DEGRADED     0     0     0
            c0t50014EE6002F39C5d0    ONLINE       0     0     0
            c0t50014EE25AFFB56Ad0    ONLINE       0     0     0
            c0t50014EE6002F65E3d0    ONLINE       0     0     0
            c0t50014EE6002F573Dd0    ONLINE       0     0     0
            c0t50014EE6002F575Ed0    ONLINE       0     0     0
            spare-5                  DEGRADED     0     0     0
              c0t50014EE6002F645Ed0  FAULTED      1    29     0  too many errors
              c0t50014EE2AFA6FC32d0  ONLINE       0     0     0
            c0t50014EE2050538DDd0    ONLINE       0     0     0
          raidz2-3                   ONLINE       0     0     0
            c0t50014EE25A518CBCd0    ONLINE       0     0     0
            c0t50014EE65584A979d0    ONLINE       0     0     0
            c0t50014EE65584AC0Ed0    ONLINE       0     0     0
            c0t50014EE2B066A6D2d0    ONLINE       0     0     0
            c0t50014EE65584D139d0    ONLINE       0     0     0
            c0t50014EE65584E5CBd0    ONLINE       0     0     0
            c0t50014EE65584E120d0    ONLINE       0     0     0
          raidz2-4                   ONLINE       0     0     0
            c0t50014EE65584EB2Cd0    ONLINE       0     0     0
            c0t50014EE65584ED80d0    ONLINE       0     0     0
            c0t50014EE65584EF52d0    ONLINE       0     0     0
            c0t50014EE65584EFD9d0    ONLINE       0     0     1
            c0t50014EE2AFA6B6D0d0    ONLINE       0     0     0
            c0t5000CCA221C2A603d0    ONLINE       0     0     0
            c0t50014EE655849F19d0    ONLINE       0     0     0
        spares
          c0t50014EE2AFA6FC32d0      INUSE     currently in use

errors: Permanent errors have been detected in the following files:

The Rebuild process named "resilvering" will depend on the speed of the individual drives and their occupancy. Think about 25MB/s top speed. However here is a real life example of multiple failures and actual speed of 5MB/s - so we are talking about week(s)- these are 2TB 7200 RPM WD Drives.

~# zpool status
  pool: dpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Nov 13 10:41:28 2014
        338M scanned out of 48.3T at 5.72M/s, (scan is slow, no estimated time)
        32.3M resilvered, 0.00% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        dpool                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/9640be78-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0  (resilvering)
            gptid/97b9d7c5-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/994daffc-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/9a7c78a3-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/9c48de9d-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/9e1ca264-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0  (resilvering)
            gptid/9fafcc1e-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/a130f0df-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/a2b07b02-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/a44e4ed9-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/a617b0c5-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/a785adf7-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/a8c69dd8-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0  (resilvering)
            gptid/aa097d45-a3e1-11e3-844a-001b21675440  ONLINE       0     0     1  (resilvering)
            gptid/ab7e0047-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/acfe5649-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0  (resilvering)
            gptid/ae5be1b8-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/afd04931-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/b14ef3e7-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/b2c8232a-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/b43d9260-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/b5bd6d79-a3e1-11e3-844a-001b21675440  ONLINE       0     0     1  (resilvering)
            gptid/b708060f-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/b8445901-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/b9c3b4f4-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/bb53a54f-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/bccf1980-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/be50575e-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0  (resilvering)
            gptid/bff97931-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
            gptid/c1b93e80-a3e1-11e3-844a-001b21675440  ONLINE       0     0     0
        spares
          gptid/c4f52138-a3e1-11e3-844a-001b21675440    AVAIL
          gptid/c6332a6f-a3e1-11e3-844a-001b21675440    AVAIL

errors: No known data errors

Data protection in RAIDZ is NOT meant to replace backups. In a PB of storage with RAID2 protection only within the first 3 years one is statistically guaranteed to lose at least some files. Hence replication to second place is mandatory. FreeNAS supports ZFS send/receive as well as rsync. If one has set monitoring and pays attention to one's notification then it is easy to initiate spare inserting into the zpools. However current FreeNAS version (9.2.1.8) does not provide for an easy way to identify the slot/enclosure of the failed disk. You can check my answer on the topic: How to determine which disk failed in a FreeNAS / ZFS setup

Can you explain, why 2 simultaneous failures on RAIDZ2 (which should sustain up to two drive failures) causes "almost guaranteed unrecoverable file errors"? Also, why "In a PB of storage with RAID2 protection only within the first 3 years one is statistically guaranteed to lose at least some files"? Due to unrecoverable drive faults, or by some other means? — Somescout, Nov 13 '14 at 19:23
I expanded my answer including reference to the **Bitrot** problem of Storage Media and **Unrecoverable Read Errors (URE)** affecting the success of recovery from failures. Long time of scrubbing of ever bigger drives stripes and performance degradation during scrub forces operators to avoid it all together which means that repairs typically are done on first failure but this creates exposure for second drive failure for too long. — Dimitar Boyn, Nov 14 '14 at 02:23

score 1 · Answer 2 · answered Mar 24 '15 at 11:52

I can answer the following questions from personal experience;

You asked: How long does it take for FreeNAS to recover from a 2-disk failure in RAIDZ2?

I note: I am presently replacing an existing non-failed drive with another new larger drive using the "Replace" command found in Volume Status.

4.67T of data needed to be scanned. I got a 30M/s resilvering transfer rate. Which I think this is pretty good. It took about 48 hours to replace the drive. Since the array was not degraded I was not (as) concerned about another drive failing during the process.

This is why it is important to replace drives before they fail, such as from SMART errors, or any type of write or read errors. I also agree with Dimitar that the pool should be synced to another logically separate device, preferably hourly.

You asked: Can a layperson perform the restoration graphically from either an SO-quality manual or a wizard?

I note: In my opinion, no. It takes a good bit of technical skill to work with FreeNAS/ZFS.

score 1 · Answer 3 · answered Aug 10 '19 at 19:35

I have a freenaz 5 disk raid-z1 pool 3TB drives for over 5 years and have lost single drives some SMART would alert me a few other times find a drive clicking or dead most of those times my array said degraded. There are many tutorials how to do this as well as YouTube videos. Couple of tips take a screen shot of the degraded drive which SN# goes to what drive before you shut down to replace it. Setup email alerts in Freenas to send you a text when it gets degraded. The re-silver process takes about 4-12 hours in my experience do not use your array while it's degraded. If you need to order a drive leave it on but don't use it. The reason I say this is electronics when they have been running for long periods of time then you shut them down and they cool off can have problems. I usually leave it running only shut it down for a few minutes to replace the drive.

What does a hard disk failure in a ZFS pool look like, and what do you actually do?

3 Answers3

Linked