4

I replaced a broken drive with a type already used in my pool for some time. The problem I face is that the resilver process seems to be stuck in a restart-loop. Whenever I check the zpool status I see that the resilver process started a couple of seconds ago. The progress percentage stays stuck at 0%

e.g.

  • scan: resilver in progress since Thu Jun 1 09:13:27 2017
  • scan: resilver in progress since Thu Jun 1 09:15:10 2017
  • scan: resilver in progress since Thu Jun 1 09:18:11 2017

...

I have no idea what is going wrong, never encountered this issue before. I would appreciate some advise from the community.

root@nas:~# dmesg | grep ZFS
[5.224533] ZFS: Loaded module v0.7.0-rc4_36_g2d82116e8, ZFS pool version 5000, ZFS filesystem version 5

root@nas:~# uname -a
Linux nas 4.9.0-0.bpo.3-amd64 #1 SMP Debian 4.9.25-1~bpo8+1 (2017-05-19) x86_64 GNU/Linux

root@nas:~# zpool status
[code]
  pool: naspool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jun  1 09:13:27 2017
        3.52G scanned out of 12.8T at 157M/s, 23h40m to go
        720M resilvered, 0.03% done
config:

        NAME                                   STATE     READ WRITE CKSUM
        naspool                                DEGRADED     0     0     0
          raidz1-0                             DEGRADED     0     0     0
            wwn-0x5000c5005d126ae9             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F3LC75    ONLINE       0     0     0
            ata-ST4000DM005-2DP166_ZDH1BA2G    ONLINE       0     0     0
            replacing-3                        DEGRADED     0     0     0
              11962083988745856144             UNAVAIL      0     0     0  was /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_45U3NSGGS-part1
              ata-ST4000DM005-2DP166_ZDH1L31E  ONLINE       0     0     0  (resilvering)
            ata-ST3000DM001-1CH166_W1F517W8    ONLINE       0     0     0
errors: No known data errors[/code]

root@nas:~# zpool status
[code]
  pool: naspool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Jun  1 09:15:10 2017
        511M scanned out of 12.8T at 31.9M/s, 116h23m to go
        101M resilvered, 0.00% done
config:

        NAME                                   STATE     READ WRITE CKSUM
        naspool                                DEGRADED     0     0     0
          raidz1-0                             DEGRADED     0     0     0
            wwn-0x5000c5005d126ae9             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F3LC75    ONLINE       0     0     0
            ata-ST4000DM005-2DP166_ZDH1BA2G    ONLINE       0     0     0
            replacing-3                        DEGRADED     0     0     0
              11962083988745856144             UNAVAIL      0     0     0  was /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_45U3NSGGS-part1
              ata-ST4000DM005-2DP166_ZDH1L31E  ONLINE       0     0     0  (resilvering)
            ata-ST3000DM001-1CH166_W1F517W8    ONLINE       0     0     0

errors: No known data errors

root@nas:~# zpool status

    pool: naspool
     state: DEGRADED
    status: One or more devices is currently being resilvered.  The pool will
            continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
      scan: resilver in progress since Thu Jun  1 09:18:11 2017
            40.0M scanned out of 12.8T at 3.34M/s, (scan is slow, no estimated time)
            7.78M resilvered, 0.00% done
    config:

            NAME                                   STATE     READ WRITE CKSUM
            naspool                                DEGRADED     0     0     0
              raidz1-0                             DEGRADED     0     0     0
                wwn-0x5000c5005d126ae9             ONLINE       0     0     0
                ata-ST3000DM001-1CH166_Z1F3LC75    ONLINE       0     0     0
                ata-ST4000DM005-2DP166_ZDH1BA2G    ONLINE       0     0     0
                replacing-3                        DEGRADED     0     0     0
                  11962083988745856144             UNAVAIL      0     0     0  was /dev/disk/by-id/ata-TOSHIBA_DT01ACA300_45U3NSGGS-part1
                  ata-ST4000DM005-2DP166_ZDH1L31E  ONLINE       0     0     0  (resilvering)
                ata-ST3000DM001-1CH166_W1F517W8    ONLINE       0     0     0
   errors: No known data errors

root@nas:~# ls -altr /dev/disk/by-id/

    total 0
    drwxr-xr-x 8 root root 160 May 31 20:08 ..
    drwxr-xr-x 2 root root 800 May 31 20:08 .
    lrwxrwxrwx 1 root root   9 Jun  1 09:19 wwn-0x5001b44a10bb94af -> ../../sda
    lrwxrwxrwx 1 root root   9 Jun  1 09:19 ata-SanDisk_SDSSDP128G_133230401711 -> ../../sda
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5001b44a10bb94af-part5 -> ../../sda5
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5001b44a10bb94af-part2 -> ../../sda2
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5001b44a10bb94af-part1 -> ../../sda1
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-SanDisk_SDSSDP128G_133230401711-part5 -> ../../sda5
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-SanDisk_SDSSDP128G_133230401711-part2 -> ../../sda2
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-SanDisk_SDSSDP128G_133230401711-part1 -> ../../sda1
    lrwxrwxrwx 1 root root   9 Jun  1 09:19 wwn-0x5000c500646bb32a -> ../../sdc
    lrwxrwxrwx 1 root root   9 Jun  1 09:19 ata-ST3000DM001-1CH166_Z1F3LC75 -> ../../sdc
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5000c500646bb32a-part1 -> ../../sdc1
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-ST3000DM001-1CH166_Z1F3LC75-part1 -> ../../sdc1
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5000c500646bb32a-part9 -> ../../sdc9
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-ST3000DM001-1CH166_Z1F3LC75-part9 -> ../../sdc9
    lrwxrwxrwx 1 root root   9 Jun  1 09:19 wwn-0x5000c500a2ef74a9 -> ../../sde
    lrwxrwxrwx 1 root root   9 Jun  1 09:19 ata-ST4000DM005-2DP166_ZDH1L31E -> ../../sde
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5000c500a2ef74a9-part9 -> ../../sde9
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 wwn-0x5000c500a2ef74a9-part1 -> ../../sde1
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-ST4000DM005-2DP166_ZDH1L31E-part9 -> ../../sde9
    lrwxrwxrwx 1 root root  10 Jun  1 09:19 ata-ST4000DM005-2DP166_ZDH1L31E-part1 -> ../../sde1
    lrwxrwxrwx 1 root root   9 Jun  1 09:20 wwn-0x5000c500a2a93310 -> ../../sdd
    lrwxrwxrwx 1 root root   9 Jun  1 09:20 ata-ST4000DM005-2DP166_ZDH1BA2G -> ../../sdd
    lrwxrwxrwx 1 root root  10 Jun  1 09:20 wwn-0x5000c500a2a93310-part1 -> ../../sdd1
    lrwxrwxrwx 1 root root  10 Jun  1 09:20 ata-ST4000DM005-2DP166_ZDH1BA2G-part1 -> ../../sdd1
    lrwxrwxrwx 1 root root  10 Jun  1 09:20 wwn-0x5000c500a2a93310-part9 -> ../../sdd9
    lrwxrwxrwx 1 root root  10 Jun  1 09:20 ata-ST4000DM005-2DP166_ZDH1BA2G-part9 -> ../../sdd9
    lrwxrwxrwx 1 root root   9 Jun  1 09:21 wwn-0x5000c5005d1119cc -> ../../sdf
    lrwxrwxrwx 1 root root   9 Jun  1 09:21 ata-ST3000DM001-1CH166_W1F517W8 -> ../../sdf
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 wwn-0x5000c5005d1119cc-part1 -> ../../sdf1
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 ata-ST3000DM001-1CH166_W1F517W8-part1 -> ../../sdf1
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 wwn-0x5000c5005d1119cc-part9 -> ../../sdf9
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 ata-ST3000DM001-1CH166_W1F517W8-part9 -> ../../sdf9
    lrwxrwxrwx 1 root root   9 Jun  1 09:21 wwn-0x5000c5005d126ae9 -> ../../sdb
    lrwxrwxrwx 1 root root   9 Jun  1 09:21 ata-ST3000DM001-1ER166_Z500CKWL -> ../../sdb
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 wwn-0x5000c5005d126ae9-part1 -> ../../sdb1
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 ata-ST3000DM001-1ER166_Z500CKWL-part1 -> ../../sdb1
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 wwn-0x5000c5005d126ae9-part9 -> ../../sdb9
    lrwxrwxrwx 1 root root  10 Jun  1 09:21 ata-ST3000DM001-1ER166_Z500CKWL-part9 -> ../../sdb9
Thomas
  • 4,155
  • 5
  • 21
  • 28
straumli
  • 61
  • 3
  • Maybe related: https://github.com/zfsonlinux/zfs/issues/5970 and https://github.com/zfsonlinux/zfs/issues/840 – duenni Jun 02 '17 at 08:54
  • Is it possible that the replacement disk has been dropping in and out of your storage subsystem? Check your system logs for signs of bus or disk resets and similar, particularly around the times quoted. It *shouldn't* cause a running resilver to restart from scratch, but if it's happening all the time, maybe ZFS decided that the disk was suddenly in an inconsistent state and since so little data had been resilvered the safest course of action was simply to start over. – user Jun 07 '17 at 09:12

1 Answers1

2

I was unable to fix this and ended up by replacing the disk by another, which resolved the issue. I suspect the drive was faulty, even when no SMART errors were logged.

straumli
  • 61
  • 3