4

I'm looking for a minimal ZFS setup to achieve self-healing on a single device. Since I back up my data, drive failures are no concern, but bit-rot is. So far, the only suggestions I've been able dredge up pointed to the copies=2 option, but that cuts the usable capacity by 50%.

My question: What are the drawbacks (other than performance) to split the device into 8 partitions and joining them into a 7+1 RAID-Z1 array? That would mean a 12.5% reduction in usable space. While I'm at it, why not go 19+1 or 99+1 (or whatever the partition table allows)?

bz9js
  • 41
  • 4
  • 3
    You think resilvering a RAIDZ1 vdev made up of multiple drives is hard on the drives due to seeking? Try having all of the drives being a single drive, with seeking *also* within that drive to access the data. And where a single read might otherwise suffice, you now need up to seven reads, *every time*. ZFS already isn't exactly a high-performance file system. To say that performance would be dismal with a setup like you are proposing would probably be the understatement of the year. – user Jul 31 '15 at 11:22
  • I'm thinking 3 copies should be plenty. – Datarecovery.com MK Jul 31 '15 at 15:07
  • 1
    Looks like you're aware of the drawbacks (and the right way to do it), but you're probably underestimating them. Why not just use a single vdev and restore from backup in the event of checksum errors? – bahamat Aug 04 '15 at 19:48

1 Answers1

3

From the ZFS Best Practices Guide:

For production systems, use whole disks rather than slices for storage pools for the following reasons:

  • Allows ZFS to enable the disk's write cache for those disks that have write caches. If you are using a RAID array with a non-volatile write cache, then this is less of an issue and slices as vdevs should still gain the benefit of the array's write cache.
  • For JBOD attached storage, having an enabled disk cache, allows some synchronous writes to be issued as multiple disk writes followed by a single cache flush allowing the disk controller to optimize I/O scheduling. Separately, for systems that lacks proper support for SATA NCQ or SCSI TCQ, having an enabled write cache allows the host to issue single I/O operation asynchronously from physical I/O.
  • The recovery process of replacing a failed disk is more complex when disks contain both ZFS and UFS file systems on slices. ZFS pools (and underlying disks) that also contain UFS file systems on slices cannot be easily migrated to other systems by using zpool import and export features.
  • In general, maintaining slices increases administration time and cost. Lower your administration costs by simplifying your storage pool configuration model.

To sum it up, it is much much slower and more difficult to correctly handle, replace and grow.

Additionally, you still have to care about your pool layout. Your RAIDZ1 setup would still suffer from the RAID5 write hole problem while replacing the slice, and it will still suffer if you choose non-optimal amounts of slices for your RAIDZ level (also from the recommendations in the guide):

  • (N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
  • The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups.
user121391
  • 2,452
  • 12
  • 31