How does zfs raidz-2 recover from 3 drives down?

Question

Im wondering what happened, how ZFS was able to recover completely, or if my data is still truly in tact.
When i came in last-night i saw this to my dismay, then confusion.

zpool status
  pool: san
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 392K in 0h0m with 0 errors on Tue Jan 21 16:36:41 2020
config:

        NAME                                          STATE     READ WRITE CKSUM
        san                                           DEGRADED     0     0     0
          raidz2-0                                    DEGRADED     0     0     0
            ata-WDC_WD20EZRX-00DC0B0_WD-WMC1T3458346  ONLINE       0     0     0
            ata-ST2000DM001-9YN164_W1E07E0G           DEGRADED     0     0    38  too many errors
            ata-WDC_WD20EZRX-19D8PB0_WD-WCC4M0428332  DEGRADED     0     0    63  too many errors
            ata-ST2000NM0011_Z1P07NVZ                 ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAJ490344  ONLINE       0     0     0
            wwn-0x50014ee20949b6f9                    DEGRADED     0     0    75  too many errors

errors: No known data errors

How is it possible no data errors, and the entire pool not being faulted?

One drive sdf has a smartctl test failure to S.M.A.R.T. read fail, the other ones a slightly lesser issue; uncorrectable/pending sectors or UDMA CRC Errors.

I tried toggling each faulted drive offline, then back online, one at a time, that didnt help.

    $ zpool status
  pool: san
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 392K in 0h0m with 0 errors on Tue Jan 21 16:36:41 2020
config:

        NAME                                          STATE     READ WRITE CKSUM
        san                                           DEGRADED     0     0     0
          raidz2-0                                    DEGRADED     0     0     0
            ata-WDC_WD20EZRX-00DC0B0_WD-WMC1T3458346  ONLINE       0     0     0
            ata-ST2000DM001-9YN164_W1E07E0G           DEGRADED     0     0    38  too many errors
            ata-WDC_WD20EZRX-19D8PB0_WD-WCC4M0428332  OFFLINE      0     0    63
            ata-ST2000NM0011_Z1P07NVZ                 ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAJ490344  ONLINE       0     0     0
            wwn-0x50014ee20949b6f9                    DEGRADED     0     0    75  too many errors

So then, feeling extremely lucky, or a bit confused if my data could actually still all be there, after an inspection to find the worst drive, i did a replace with my only spare.

    $ zpool status
  pool: san
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jan 21 17:33:15 2020
        467G scanned out of 8.91T at 174M/s, 14h10m to go
        77.6G resilvered, 5.12% done
config:

        NAME                                              STATE     READ WRITE CKSUM
        san                                               DEGRADED     0     0     0
          raidz2-0                                        DEGRADED     0     0     0
            ata-WDC_WD20EZRX-00DC0B0_WD-WMC1T3458346      ONLINE       0     0     0
            replacing-1                                   DEGRADED     0     0     0
              ata-ST2000DM001-9YN164_W1E07E0G             OFFLINE      0     0    38
              ata-WDC_WD2000FYYZ-01UL1B1_WD-WCC1P1171516  ONLINE       0     0     0  (resilvering)
            ata-WDC_WD20EZRX-19D8PB0_WD-WCC4M0428332      DEGRADED     0     0    63  too many errors
            ata-ST2000NM0011_Z1P07NVZ                     ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAJ490344      ONLINE       0     0     0
            wwn-0x50014ee20949b6f9                        DEGRADED     0     0    75  too many errors

The resilver did complete successfully.

$ zpool status
  pool: san
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 1.48T in 12h5m with 0 errors on Wed Jan 22 05:38:48 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        san                                             DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            ata-WDC_WD20EZRX-00DC0B0_WD-WMC1T3458346    ONLINE       0     0     0
            ata-WDC_WD2000FYYZ-01UL1B1_WD-WCC1P1171516  ONLINE       0     0     0
            ata-WDC_WD20EZRX-19D8PB0_WD-WCC4M0428332    DEGRADED     0     0    63  too many errors
            ata-ST2000NM0011_Z1P07NVZ                   ONLINE       0     0     0
            ata-WDC_WD20EARX-00PASB0_WD-WCAZAJ490344    ONLINE       0     0     0
            wwn-0x50014ee20949b6f9                      DEGRADED     0     0    75  too many errors

Im at a crossroads right now. I usually dd zero the first 2MB of the faulted drive, and replace with itself, which im ok with doing, however if there is really data missing, i may need these last two volumes to recover things.

I have this sdf on my desk now, removed. I feel i can, worst case scenario, use this one to aid some recovery.

In the meantime, i think im going to dev/zero the first couple MB of the degraded drive now, and replace with itself, and i think things should work out, rinse and repeat for the 2nd faulted drive, until i can get some replacements on hand.

Question What happened, how was the pool able to hang on, or may i be missing some data (doubtful given the integrity of zfs, and its reports)

Could it have been due to a lucky order of failure, e.g. not the top drive of the stack that failed??

Question This one is just FYI, and not related to the topic. What caused all 3 to fail at the same time? I think it was a scrub that was the catalyst. I checked the night before and all drives were online.

Note, the cabling has been an issue in the recent past, the office gets cold at night, but those issues have just been drive unavailable, as opposed to checksum errors. I am thinking thats not cabling, but maybe aging drives, which they are 5yrs old. But 3 failures in one day? Come on, thats enough to scare alot of us!

score 4 · Accepted Answer · answered Jan 23 '20 at 14:18

RAID-Z2 is double parity, redundancy similar to RAID 6. Two disks could completely fail, and data recovered from parity. Assuming the rest of the array is healthy.

You didn't necessarily have I/O errors. DEGRADED means ZFS kept using the disk, despite checksum errors. Perhaps because of a few bit flips, but the drive still functions. Per the link from that output:

Run 'zpool status -x' to determine which pool has experienced errors.

Find the device with a non-zero error count for READ, WRITE, or CKSUM. This indicates that the device has experienced a read I/O error, write I/O error, or checksum validation error. Because the device is part of a mirror or RAID-Z device, ZFS was able to recover from the error and subsequently repair the damaged data.

If these errors persist over a period of time, ZFS may determine the device is faulty and mark it as such. However, these error counts may or may not indicate that the device is unusable.

Regarding drive health:

maybe aging drives, which they are 5yrs old. But 3 failures in one day? Come on, thats enough to scare alot of us!

Backup restore test important data now. From different media, not this array.

Replace drives that continue to be degraded. Definitely if the kernel reports I/O errors in syslog. If under warranty or support contract, take advantage of that. If passed warranty, the manufacturer wagered they won't last this long, so take that under consideration.

How does zfs raidz-2 recover from 3 drives down?

1 Answers1