Would a ZFS mirror with one drive mostly offline work?

6

3

Scenario: I have two external hard drives, and I'd like one to be a backup of the other. Traditionally, I would periodically connect the second drive and rsync across any changes. Does ZFS provide a better way of doing this?

I would think I'd want to create a 'zfs mirror' setup, however I wouldn't want to transport the backup drive with me all the time for convenience, but rather synchronise any changes periodically. Does ZFS provide a way to do this, or is this not an appropriate use? If so, what's the canonical ZFS-way of doing this? (I don't want to bash the drive by it having to check every single sector for changes every time I want to update the backup drive for example)

Matthew

Posted 2015-06-14T12:32:46.480

Reputation: 165

Answers

8

ZFS has limited ability to incrementally update a mirrored drive after it has been offline for a while. TL;DR: You can do what you are looking for the way you are suggesting, but it's not what mirrors are meant to do.

In practice, what you are suggesting would almost certainly require a full resilver each time, because the interim changes would lead to too many überblock revisions having gone by, so there would be no common base point for an incremental resilver. If there is a failure during that process, it seems likely that you would be in deep trouble as far as your data is concerned. Also keep in mind that due to its Merkle tree on-disk data format, ZFS resilvers can be (and are) done in a "in order of decreasing data importance", rather than sequentially like non-file-system-based RAID systems. Of course, "data importance" here is as far as ZFS is concerned, not as far as what you might consider to be important or worth keeping. The resultant seek activity can easily put major stress on particularly a single drive.

The canonical way to bring two ZFS file systems in sync is to use a zfs send | zfs receive between them. This requires both file systems to be available (but you can store the output of zfs send and use that as the input to zfs receive later, should you be so inclined, but you should be aware that this comes with a huge caveat: zfs receive makes no attempt to recover from a partially damaged stream of data, and just aborts if errors are detected).

  • Have one pool for each backup drive. Let's call them tank and pipe. Let's say we have data on tank that we want to copy over to pipe.
  • Connect both drives, and zpool import both tank and pipe. You can pass -N to zpool import to make it not mount any file systems.
  • Take a snapshot of the source file system, tank. zfs snapshot tank@current1984 -r
  • Find the most recent snapshot that both tank and pipe has in common. Use something like zfs list tank pipe -t snapshot to get a raw list to work from. Let's say that the most recent snapshot they have in common is current1948.
  • Run something like zfs send -R -I tank@current1948 tank@current1984 | zfs receive pipe to incrementally transfer the delta between the current1948 and current1984 snapshots from tank to pipe. Read the zfs man page for more details on the send and receive subcommands.
  • Wait for that to finish, then optionally delete any snapshots that are no longer needed. Make sure to keep at least one snapshot (for example, current1984) that both pools (file systems, rather) have in common, to use as the base the next time.

At this point, the two pools will have the same content, up to the snapshot you used. If done properly, this should also only require transferring the differences; I cannot imagine a scenario in which an incremental zfs send | zfs receive would need to do anything like a full mirror resilver. It also allows you to later on add redundancy to the backup pools, should you wish to do so. If the source drive fails during the copying process, you should still have the old backup readily available; only the differences that you were attempting to transfer would be lost.

a CVn

Posted 2015-06-14T12:32:46.480

Reputation: 26 553

Amazingly in-depth answer. Thanks very much for your reply! – Matthew – 2015-06-14T15:09:00.627

4

@Matthew ZFS questions get nowhere near the amount of love they deserve on Super User. (Oh, and make sure you're using ECC RAM. No need to thank me for that.)

– a CVn – 2015-06-14T17:26:36.387

If ECC RAM isn't available, what filesystem would you recommend? The link makes it sound like perhaps a 'dumber' filesystem would be a better option in that instance? – Matthew – 2015-06-14T17:31:49.927

1@Matthew If ECC RAM isn't available, then one of the only major assumptions that ZFS makes falls completely apart. Remember, ZFS was designed for high-end servers, and in that environment, non-ECC RAM basically doesn't exist. If your system doesn't have ECC RAM capability, then you are probably better off scrapping ZFS and instead using a regular file system native to your platform (ext4, UFS, HFS+, whatever) that doesn't try to automatically correct encountered errors. You can run something like hashdeep regularly to catch bit rot after the fact, and restore a good copy from backups. – a CVn – 2015-06-14T17:37:38.877

1@Matthew Mind you, ZFS on non-ECC-RAM systems can work, especially with high quality RAM. I ran it like that for a while before I replaced the RAM with ECC sticks. But you are really living on the edge by doing so; any memory error can turn from just a notice in a log file to a data-destroying issue. That's why the general recommendation is to never use ZFS without ECC RAM; with ZFS' automatic repair behavior, it's just too high risk. With other file systems, a RAM bit flip doesn't automatically spread through every piece of data and metadata stored on disk, whereas with ZFS, it can. – a CVn – 2015-06-14T17:42:39.127