A disk replacement in ZFS went awry, and now the replacing disk, even though no longer physically present, is "stuck" in the pool, blocking further replacement attempts. How to remove it?
In a raidz3 pool with 11 disks on OmniOS r151010, one of the disks went bad. I took the problem disk offline, replaced it with a new disk, and got the new disk reconfigured. It started to resilver, and then the replacement disk had errors. Dmesg showed " SYNCHRONIZE CACHE command failed." I wondered if it might be a loose cable, so shut down the machine, reseated the disk and cables and started it up again. It started resilvering, and after a while had the same problem. At this point zpool status for the problem disk shows
replacing-0 UNAVAIL 0 0 0 insufficient replicas
c4t5000C5004DC8693Fd0 OFFLINE 0 0 0
c4t50014EE658315C1Dd0 FAULTED 0 0 0 too many errors
I decided to try another disk, and see if that made any difference. I suspected it wouldn't, but it was easy to try. I hot-swapped the disk, and then cfgadm -al showed
c8 scsi-sas connected configured unknown
c8::w50014ee6ad8f0df2,0 disk-path connected configured unknown
c8::w50014ee658315c1d,0 disk-path connected unconfigured unknown
The new disk is there, but the old one hasn't gone away. I restarted the machine to clear out old state, then cfgadm -al showed just
c8 scsi-sas connected configured unknown
c8::w50014ee6ad8f0df2,0 disk-path connected configured unknown
However, the zpool status still showed the old disk. I tried clearing the fault, and now the original disk and the 1st replacement are both offline
replacing-0 UNAVAIL 0 0 0 insufficient replicas
c4t5000C5004DC8693Fd0 OFFLINE 0 0 0
c4t50014EE658315C1Dd0 OFFLINE 0 0 0
At this point, what should I do to get the new replacement disk resilvering? Doing zpool replace on the original disk or the first replacement just yields the error (slightly shortened here) "cannot open 'c4t500....' no such device in /dev/dsk."
Doing a zpool remove on c4t50014EE658315C1Dd0 yields the error message "cannot remove c4t50014EE658315C1Dd0: only inactive hot spares, cache, top-level, or log devices can be removed"