ZFS pool degraded with faulted disk, export and replace not working

Question

I run into troubles after trying to add more disks to my Ubuntu server device. While being a total beginner I powered the server off, added two more disks and restarted the system only to find one of the disks in the existing mirror "FAULTED".


matsojala@amatson:~$ zpool status -v
  pool: tank
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 21h20m with 0 errors on Fri Feb  8 14:15:04 2019
config:

        NAME                      STATE     READ WRITE CKSUM
        tank                      DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            sdb                   ONLINE       0     0     0
            12086301109920570165  FAULTED      0     0     0  was /dev/sdb1

errors: No known data errors

I tried to export and import based on this answer (ZFS pool degraded on reboot) but exporting fails


matsojala@amatson:~$ sudo zpool export -f tank
umount: /tank: target is busy.
cannot unmount '/tank': umount failed

I'm not sure which way I should try to replace the disk as the disk on the system is "part of active pool".


matsojala@amatson:~$ sudo zpool replace -f tank 12086301109920570165 sdc1
invalid vdev specification
the following errors must be manually repaired:
/dev/sdc1 is part of active pool 'tank'

Tried this too.


matsojala@amatson:~$ sudo zpool replace tank sdb
/dev/sdb is in use and contains a unknown filesystem.

Any help? The disk was fully working before powering off, it is in the system named as /dev/sdc1 with ID "12086301109920570165". What should I do?

Thanks.

score 2 · Accepted Answer · answered Feb 08 '19 at 18:34

It looks like you've been using names like /dev/sda to reference disks. That's generally not a good idea, because if your disks get assigned different names after a reboot or an unplug-replug cycle, then ZFS can get confused. Instead, you should create your pool using the device files in /dev/disk/by-id/, .../by-uuid/, or .../by-label/.

In your case, I'm not totally certain, but it kind of looks like /dev/sdb1 got relabeled to /dev/sdc1 after reboot, which is why /dev/sdc1 looks like it's part of the pool even though it doesn't appear in zpool status. You could try to fix it by unplugging the extra disks you added -- that would probably allow the labels to go back to how they were originally -- and then doing an export followed by zpool import -d /dev/disk/by-id tank, to force ZFS to relabel the pool based on the by-id disk names.

If the export doesn't work because it's busy, make sure no process is accessing files on the pool and try again. I am not a Linux user, but it appears there is also some configuration file you can use to help you do this during reboot: this post on Github suggests setting USE_DISK_BY_ID='yes' in /etc/default/zfs to force it during reboot. Worst case you can set that and reboot -- reboot automatically runs export / import.

That said, if you want to go through with replacing the disk anyway, the Oracle docs explain the "replace one faulted disk of a mirror" use case pretty well. (Just ignore the Solaris-specific instructions about unconfiguring the disk with cfgadm.) I think the main steps you missed were running zpool offline tank <faulted disk> before running zpool replace tank <new disk>.

Thanks. I have no idea what happened, suddenly export worked and once I figured I had to sudo the import the pool accepted both disks and started resilvering process. I used the by-id import now that I know more about what to do. I guess some processes had stopped to make the export happen, the problem was that I wasn't sure how to find and kill those processes. Hopefully the errors disappear after resilvering. — MatsonWatson, Feb 08 '19 at 19:00

ZFS pool degraded with faulted disk, export and replace not working

1 Answers1