16

I'm planning on building a file server using OpenSolaris and ZFS that will provide two primary services - be an iSCSI target for XenServer virtual machines & be a general home file server. The hardware I'm looking at includes 2x 4-port SATA controllers, 2x small boot drives (one on each controller), and 4x big drives for storage. This allows one free port per controller for upgrading the array down the road.

Where I'm a little confused is how to setup the storage drives. For performance, mirroring appears to be king. I'm having a hard time seeing what the benefit would be of using RAIDZ over mirroring would be. With this setup I can see two options - two mirrored pools in one stripe, or RAIDZ2. Both should protect against 2 drive failures, and/or one controller failure...the only benefit of RAIDZ2 would be that any 2 drives could fail. The storage should be 50% of capacity in both cases, but the first should have much better performance, right?

The other thing I'm trying to wrap my mind around is the benefit of mirrored arrays with more than two devices. For data integrity what, if any, would be the benefit of a RAIDZ over a three-way mirror? Since ZFS maintains file integrity what does RAIDZ bring to the table...doesn't ZFS's integrity checks negate the value of RAIDZ's parity?

John Clayton
  • 632
  • 1
  • 6
  • 10

3 Answers3

18

RAID-Z eliminates most of the write penalty and the data integrity issues that RAID 5/6 volumes suffer from, at the cost of some CPU time. Typically, systems have CPU cycles to spare, so spending CPU time to improve IO performance and data integrity is a good compromise vs. mirroring.

Here is a detailed explanation of RAID-Z that may answer other questions.

Also, remember that RAID is a fault-tolerance solution. You don't implement RAID-Z2 to protect against data loss -- you perform backups or replicate to do that. You choose to implement RAID-Z2 vs. RAID-Z or RAID-10 vs RAID-6 vs. RAID-5 to keep your systems operational in the event of hardware failure.

Adam Katz
  • 869
  • 8
  • 16
duffbeer703
  • 20,077
  • 4
  • 30
  • 39
  • 1
    Gotta give the answer to dotwaffle since he helped me understand the technical difference. Great advice on intended usage though...that really made me stop and think. – John Clayton Sep 11 '09 at 13:25
  • 1
    An important thing to note is that while RAIDZ eleminates the *write* penalty issues, it introduces *read* penalty issues due to [increased concurrency for each read operation](https://blogs.oracle.com/relling/entry/zfs_raid_recommendations_space_performance) – the-wabbit Jun 16 '13 at 14:19
15

The simple answer is that to mirror something takes almost no processing power - it just writes to the disk a second time. For RAID-Z2, you have to compute an entirely new parity block, which although small CAN bog down the CPU when you have to write large amounts of data quickly.

Mirroring is always the preferred solution for high-speed data, if it's just bulk-storage without fast write speeds, RAID-Z2 is a good alternative that does allow any two drives to die as you allude to.

The other advantage is that mirrored pools can be expanded with more mirrored devices - while a RAID-Z2 can not be expanded - though more RAID-Z2 storage can be added to the pool, it will be two RAID-Z2 storage pools concatenated (in effect) rather than equally split between all the storage and striped.

dotwaffle
  • 657
  • 4
  • 8
  • But in ZFS what does the parity block of RAIDZ give you? Does it provide any additional data integrity beyond what ZFS already provides? Or is it simply needed for any two drives to die? If that's the only benefit than in the three-way scenario there is no benefit of RAIDZ over a mirror, right? – John Clayton Sep 09 '09 at 15:52
  • 2
    RAID-Z allows one drive to die - if you have 10 drives, you get 9 drives worth of data. With RAID-Z2 you can have any two drives die, and have 8 drives worth of data. With mirror, you can have half the drives die, but only one of each set of two. I suspect you know this already, but it's this fringe case of 4 drives that gives two solutions of two-drives-failing - RAID-Z2 and Mirror mode. There is no additional data integrity allowances that are "useful" in a normal scenario. – dotwaffle Sep 09 '09 at 15:55
  • So then to be clear the benefit of RAIDZ over mirrors is a little more protection from hardware failures? RAIDZ1 and a mirror with X drives are essentially equivalent? – John Clayton Sep 09 '09 at 16:30
  • No. RAIDZ1 functions using parity. As such, it'll be slightly slower than mirroring, as parity blocks will need to be computed for each write. Mirroring does not suffer from this. The difference is equivalent to that between RAID 1 and RAID 5 (roughly). – Cian Sep 09 '09 at 16:37
  • 5
    RAID-Z's parity is a superset of ZFS's in-built checksumming. All the checksum does is ensure that the data being read from the disks is correct. It's designed to be a fast check with little overhead. What RAID-Z's parity gives you is the ability to *rebuild* damaged data in the event of loss of a drive (or 2 for RAID-Z2). But the calculations to generate this parity data is much more CPU intensive than a simple block checksum, and must be calculated on an entire stripe, even if you're only writing a 4KB block. – afrazier May 03 '10 at 14:10
  • 4
    @John Clayton: the level of redundancy depends on how you configure your storage. You can create an N-way mirror, meaning that your data capacity is the size of one drive, but you can lose N-1 drives without losing any data. For example, if you have a 3-way mirror, all 3 drives contain the same data, and you can lose 2 drives without losing any data. The difference is that adding drives to an N-way mirror increases the redundancy (but your capacity is always the size of 1 drive), while adding drives to raidz2 increases capacity (but your redundancy is always 2 drives). – rob Aug 27 '10 at 03:19
  • Fortunately, you can throw an awful lot of Xeon cores at your RAID-Z2 math and still come nowhere near the price of an inferior proprietary SAN appliance. – Skyhawk Mar 26 '11 at 21:47
  • The only reason to use RAIDZ (or RAID5/6 for that matter) is if you absolutely require space over performance. You should almost never use RAIDZ. Mirroring does *everything* better, except that you have less overall capacity. – bahamat Sep 02 '12 at 10:38
1

The main performance difference between mirrors and RAIDZ1/2 [1] is not the expected CPU usage (which we have plenty these days), but the fact that ZFS random IOPS depends on the total number of vdevs rather than the total number of disk.

For example, 4x 2-way mirrors provides up to 8x the random read performance and 4x the random write performance of a single disk. On the other hand, a 8x RAIDZ2 vdev (6 data + 2 parity) is going to provide the same random IOPS of a single disk.

In other words: for random IO heavy workload, mirrors is the best choice performance wise.

[1]: RAIDZ3 is significantly heavier CPU wise.

shodanshok
  • 44,038
  • 6
  • 98
  • 162