13

I recently changed the checksum property on one of my non-duplicated zfs filesystems to sha256 from on (fletcher4) to better support the sending of duplicated replication steams, as in this command zfs send -DR -I _starting-snaphot_ _ending-snapshot_.

However, the zfs manpage has this to say about send -D:

This flag can be used regardless of the dataset’s dedup property, but performance will be much better if the filesystem uses a dedup-capable checksum (eg. sha256).

The zfs manpage also states this about the checksum property:

Changing this property affects only newly-written data.

I have no desire to trust fletcher4. The tradeoff is that unlike SHA256, fletcher4 is not a pseudo-random hash function, and therefore cannot be trusted not to collide. It is therefore only suitable for dedup when combined with the 'verify' option, which detects and resolves hash collisions.

How can I update the filesystem's checksums, preferably without offlining the system?

Falcon Momot
  • 24,975
  • 13
  • 61
  • 92
84104
  • 12,698
  • 6
  • 43
  • 75

1 Answers1

11

To change the properties (be it compresson, deduplication or checksumming) of already written data, the zfs approach is to run the data through a zfs send | zfs receive sequence. Obviously, you do not need to offline the system for that, but you will need

  1. enough resources in your zpool / on the system to hold two dedup'ed copies of the data set in question
  2. downtime for the data set as you either would need to destroy it or rename it in the procedure
  3. enough time and patience for the operation to complete

As you already are using deduplication for the zpool, running a zfs send | zfs receive with the destination on the same pool as the source would only use space needed for the newly-written metadata blocks. But be prepared for the copy to take a while - dedup can be awfully slow, especially if you do not have enough RAM to hold the entire dedup table in RAM.

You obvisouly would need to cease all write operations to create the final, authoritative copy of the data set, but could minimize the downtime by copying off a snapshot first, stoping all writes and doing incremental zfs send -i | zfs receive as the final step.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • It's not at all clear to me that `zfs receive` updates a filesystem's metadata. It seems to me that it would be much quicker if it simply took the metadata as is. However, doing so may be impossible due to the checksum's block, rather than file, level nature. In that case `zfs send | zfs receive` would form an acceptable base for a solution. – 84104 Oct 05 '13 at 14:12
  • 1
    zfs send | zfs recv will effectively change all metadata (compression choice, checksum choice, dedup choice). zfs send is creating an object that you then ingest using zfs recv, which writes it out pretty much as if it was all new data. However - I think you may be under a misconception about zfs send|recv in regards to deduplication. zfs send -D attempts to dedupe the data /within the stream itself/, not maintain the existing deduplication of the data from the source dataset. This is why there is no requirement that the recv side also have dedup enabled on the destination dataset. – Nex7 Oct 06 '13 at 08:59
  • To further explain -- there is currently no way to zfs send|recv deduplicated data in such a way that all that goes across the wire is a single copy of the deduplicated data and the associated dedupe table entries. Not even if the source and destination are in sync and you're sending nothing more than an incremental snapshot. ZFS still balloons the send data up to full size if the data within it happens to be un-deduplicatable /within the scope of the stream itself/. You might have data that is easily deduped in the POOL DDT, but as a small send object, be completely un-deduplicatable. – Nex7 Oct 06 '13 at 09:02