I have a pool that is in UNAVAL status (" One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning.") due to a recent disk failure.
I'm planning to have the failed disk repaired (ie, data recovery service) in order to get the pool back online long enough to migrate it, however there's one snag that I'm not sure how to work around.
The device names in my pool are using the serial number of my disks (/dev/disk/by-id/ style). I did this because I have a lot of disks, and the /dev/sd* names would move around at each boot, which of course wrought havoc on the pool. However in this case, since I'm going to be bringing the "same" disk (in terms of data, but not hardware) back online, but with a different device name, I don't think it's going to recognize it correctly automatically, and I'm not sure exactly how the "replace" command is going to treat the new disk. Perhaps it will just work, but based on the documentation, it might treat the new disk as "blank" instead of using it to repair the pool (or maybe ZFS looks at the content of the disk and acts accordingly, I'm just unsure).
In a nutshell, I want to take an offline disk that is registered in the pool by it's hardware device name, copy it to another physical disk, then bring the new disk online in place of the original.
I'm doing some experiments with non-production devices to suss this out, but any thoughts from those of you who know more about what ZFS does "under the hood" or have experience with this sort of recovery is greatly appreciated! Additionally, if there are papers, docs, etc. that get into this level of tweaking I'd be happy to study them as well.
To be crystal clear, this isn't intended to be a long-term configuration, just enough to evacuate the contents of the array, so I'm not opposed to solutions that are not suitable for long-term/production environments.