5

My experience with ZFS has generally been that it just works, so I expect the answer will be, it’s not a problem — but I have a data pool which will ruin my January if it fubars, so I want to double-check.

This is a question that could actually come up in two different situations involving a separate data pool. Right now I’m dealing with the first, but I’ve also wondered about the second:

  • The storage for the system disk (i.e., the one holding rpool) fails, but storage for the data pool is fine, so you want to restore the system disk from backups but just keep going with the live storage of the data pool.
  • You have Solaris running in a VM and want to roll back to a snapshot the hypervisor has taken (not a ZFS snapshot of rpool), but the data pool is stored on disks that are in “independent” mode, RDMs, etc., so will not be rolled back.

In both of these situations, when Solaris is booted back up, it’s going to see a data pool that it knows about but which is in a state it had never (as far as it would remember) put it into.

I’m primarily only concerned with the case where the system was cleanly shut down before the system disk is rewound, and where the system had been cleanly shut down prior to the image it’s being rewound to. I’d expect switching between running states could be a bit trickier.

Note also that in my particular case, the pool’s storage geometry and paths to the storage have not changed. Again, I’d expect this to be trickier if they had.

I wouldn’t even be asking this with Windows and NTFS because that’s a comparatively simplistic decoupled system so it’s hard to see why it wouldn’t work. However, it seems that Solaris keeps some kind of pool metadata out of band, as evidenced by the fact that you’re supposed to zpool export and zpool import when you move pools between systems (something I’ve never done in that manner thanks to VMware). My knowledge of this metadata and its purpose is limited so it’s hard for me to reason about the impact in this situation. (An explanation of this would be great!)

I actually still have access to the pre-rollback system. It’s sitting in a VMFS datastore backed by an HP SmartArray that threw a 1716 POST warning after an ill-fated preventive maintenance disk change (which lost data because SmartArray is dumber than ZFS). All important VMs still seem fine and scans of their filesystems found no errors, but I plan to restore the array from a very recent backup anyway because I have reason to suspect that ESXi silently zeros bad sectors instead of passing the errors to the guests, so I don’t want to risk some zeroed sector lurking somewhere to bite me in the butt later.

For the Solaris VM, I don’t have to worry about zeroed sectors, because ZFS would catch that, but most of the other VMs use dumb filesystems. The backup is an image of the whole VMware datastore, though, so fixing them will roll back the Solaris VM, too. Actually, I did a scrub on the rpool of this VM and it found no errors, so hell, if I wanted, I could just stash its VMDK somewhere else and copy it back in after the roll-back, and then this whole question would be moot. I guess that’s what I’ll do if nobody answers, lol. But it’s something I’ve wondered for a while, so I’ll still ask.

So, the question is, can I just go ahead and roll back the system disk’s storage and be done with it? Or would I have to export the pool from the pre-rollback system, roll back, delete the pool before attaching its storage, then attach the storage and import the pool? I don’t like the sounds of the latter, partly because there is both CIFS and iSCSI being served from that pool and I don’t remember off hand how I set those up or even how to do so, so if they break I’ll be mad. (Can you tell we don’t have a full-time sysadmin? lol)

The VM is running an older version, Solaris 11.0.

(Incidentally, it’s older partly because of the same question. I wanted to snapshot the VM prior to attempting an upgrade in case I bork it, but then I was worried about how a rolled-back system might react to the independent pool, so just left it alone. And yeah, I realize I could also snapshot the rpool, but that doesn’t give the same level of comfort for someone who doesn’t work with Solaris daily.)

Kevin
  • 1,540
  • 4
  • 22
  • 34

1 Answers1

4

This is one of those "zfs just works" kinds of answers...

The pool metadata is actually stored in the pool, not on the local OS. So, for example, if a system crashes and isn't shut down clean, the metadata within the pool knows that the pool was not "exported" cleanly. If you were to try and import this pool into a new system, you would get complaints about it not being exported and belonging to another system. At that point you would just do a zpool import -f (force) and it will come in clean.

So, specific to your data pool, the data on it would be safe, no matter when/where you tried to import the pool again. If you were to boot to a "restored" rpool, the OS on that rpool would know about the pools that it is supposed to import and would simply import the data pool. It doesn't keep track of whether or not a pool was exported or not, other than the fact that once a pool is exported, the OS no longer keeps up with it at all.

Now, with respect to the rpool question. Restoring it from a VM snapshot, tape backup, whatever won't change the way it handles the data pool, unless the backup was taken before the data pool was ever created or imported initially. If that were the case, you'd simply import the pool once the OS is restored. The data on the data pool will be safe, no matter the condition of the rpool.

I hope that helps.

As as aside, you mention your reluctance to upgrading solaris because you weren't sure how it would react to the data pool. Don't worry about that. An upgrade will preserve known pools and import them as needed.

Also note that Solaris OS upgrades are self-contained in individual "boot environments (BE)". So, when you do an OS upgrade it actually creates a completely independent OS install containing the new version... all while your OS is still up and running. Then, when you reboot, it will come up on the new OS. If there are problems with the new OS -- ie. changes to libraries, etc. you weren't expecting -- you can simply reboot again and come up into the original 11.0 version and it will be in the exact same state as it was before you upgraded. It is an awesome way to do OS upgrades!

mikem
  • 408
  • 2
  • 6
  • 2
    Happy New Year and thank you for confirming! My restore procedure ended up [borking ESXi](https://serverfault.com/q/1048959/137215), borking Active Directory (had to redo the restore after some research), but one thing it *didn't* bork was ZFS. Unsurprisingly, that just worked. – Kevin Jan 07 '21 at 16:31
  • 1
    So it seems like `import` and `export` are basically just persistent mount/unmount operations and the only out-of-band metadata is just the fact that the pool exists, and maybe the paths to its storage. For completeness I should probably ask if it would automatically handle the storage having moved to different paths or if this would require an explicit `import`. – Kevin Jan 07 '21 at 16:32
  • 2
    Basically, yes... The OS only maintains a list of pools to import. The details of those pools are stored within the pool. As for moving devices to different paths... that depends. On Solaris, yes.. you can move pools wherever you want them and the OS will find them. On Linux, I'm not so sure. I actually have a question about that as well (https://unix.stackexchange.com/questions/626749/zfs-on-linux-device-names) but have not received an answer. In Linux it appears as though you can't trust the sdXX dev naming and must use the /dev/disk/by-id path to ensure portability of pools. – mikem Jan 08 '21 at 17:13
  • I've never used ZoL, but good question, upvoted. – Kevin Jan 11 '21 at 13:49