13

When calculating IOPS for traditional RAID arrays, one could use the following formula (Borrowed from Getting The Hang Of IOPS v1.3 on Symantec Connect):

Ieffective = (n * Isingle) / (READ% + (F * WRITE%))

Where:

  • Ieffective is effective number of IOPS
  • Isingle is a single drive's average IOPS.
  • n is number of disks in the array
  • READ% is the fraction of reads taken from disk profiling
  • WRITE% is the fraction of writes taken from disk profiling
  • F is the RAID write penalty:

    RAID Level      Write Penalty
    RAID-0          1
    RAID-1          2
    RAID-5          4
    RAID-6          6
    RAID-10         2
    RAID-DP         2
    

The formula is essentially a function of:

  • IOPS for each individual drive in the array
  • The number of disks. More disks means more IOPS
  • The RAID penalty for each write operation.
    • RAID5 & RAID6 require 4+ disk operations for every write. The controller must read the block and then read the parity data (two operations), calculating the new parity data and then update the parity block and update the data block (Two more operations). RAID6 has two parity blocks and therefore requires three reads and three writes. RAID5 & RAID6 arrays are therefore capable of fewer IOPS then RAID1.
    • RAID1 & RAID10 only require 2 writes, one to each disk in the mirror.

And to be clear, this all provides an estimate of theoretical performance. Various controllers and RAID methods have tricks to speed some of this up.

ZFS's equivalent of RAID5 & RAID6 is RAIDZ and RAIDZ2. When calculating IOPS for RAIDZ arrays, can I use the same formula that I use for RAID5 & RAID6, or does ZFS have special tricks to reduce the number of operations requited for write operations.

Is there a different formula to use when calculating IOPS for RAIDZ arrays?

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184
  • 1
    *Great* question. I'm looking forward to reading answers... – EEAA Aug 15 '13 at 18:49
  • 1
    iops are mythical but this doc may provide some insight. http://info.nexenta.com/rs/nexenta/images/solution_guide_nexentastor_zfs_performance_guide.pdf – tony roth Aug 15 '13 at 19:06
  • IOPS may be theoretical, but it can provide an explanation as to why a RAID10 array will typically outperform a RAID5 array, given the same drives. – Stefan Lasiewski Aug 15 '13 at 20:13
  • One notable quote from the Nexenta doc: _"In a RAIDZ-2 configuration, a single IO coming into the VDEV needs to be broken up and written across all the data disks. It then has to have the parity calculated and written to disk before the IO could complete. If all the disks have the same latency, all the operations to the disks will complete at the same time, **thereby completing the IO to the VDEV at the speed of one disk**. If there is a slow disk with high latency in the RAIDZ-2 VDEV, the IO to the VDEV does not complete until the IO on the slowest drive completes."_ – Stefan Lasiewski Aug 15 '13 at 22:32

1 Answers1

13

This is easier to answer...

It's all distilled here: ZFS RAID recommendations: space, performance, and MTTDL and A Closer Look at ZFS, Vdevs and Performance

  • RAIDZ with one parity drive will give you a single disk's IOPS performance, but n-1 times aggregate bandwidth of a single disk.

So if you need to scale, you scale with the number of RAIDZ vdevs... E.g. with 16 disks, 4 groups of 4-disk RAIDZ would have greater IOPS potential than 2 groups of 8-disk RAIDZ.

Surprising, right?

I typically go with striped mirrors (RAID 1+0) on my ZFS installations. The same concept applies. More mirrored pairs == better performance.

In ZFS, you can only expand in units of a full vdev. So while expansion of a RAID 1+0 set means adding more pairs, doing the same for RAIDZ sets means adding more RAIDZ groups of equal compositon.

Stefan Lasiewski
  • 22,949
  • 38
  • 129
  • 184
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Good article, but it's so old that the links comparison of ZFS RAID1 vs RAID5 ("Single Parity Model Results" and "Double Parity Model Results" are missing). Rats! – Stefan Lasiewski Aug 15 '13 at 23:48
  • 1
    _RAIDZ with one parity drive will give you a single disk's IOPS performance_ This is interesting, is that a constant value regardless of read vs. write percentages? For example, RAID5's performance varies widely depending on the percentage of read verses write. A 3 Disk 15K array can vary between 130IOPS and 500IOPS depending on the read/write ration. 50% reads & 50% writes will result in *greater then* a single disk's IOPS performance. Using 3+ spindles improves performance vs. 1 spindle, correct? – Stefan Lasiewski Aug 15 '13 at 23:59
  • 1
    I just think of the vdev scaling and that performance on writes is equal to 1 disk, regardless of the composition of that vdev; mirror, RAIDZ, RAIDZ2, etc. The disks inside of the vdevs are there to boost your capacity. The number of vdevs is used to build your striping performance. Reads from the vdevs will scale with the # of drives. – ewwhite Aug 16 '13 at 00:08
  • Well, that certainly simplifies the math. – Stefan Lasiewski Aug 16 '13 at 00:34
  • RAID5 & 6 are penalized do the the read/modify/write cycle. If I understand correctly, [ZFS avoids this penalty due to the Copy-on-Write (COW) nature](http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance#raidz). If that's write, then RAIDZ should totally outperform an equivalent RAID5 array, at least as far as theoretical IOPS. – Stefan Lasiewski Aug 16 '13 at 01:05
  • And yes, apparently my understanding about ZFS RAIDZ avoiding the IOPS penalty is correct, according to Constantin Gonzalez, the author of "[**A Closer Look at ZFS, Vdevs and Performance**](http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance)" -- See his confirmation at http://disq.us/8enmt3 – Stefan Lasiewski Aug 16 '13 at 18:18
  • A decent RAID controller has a BBU cache, which also adds quite a bit of performance for write load. Also, any raid - ZFS or hardware has to verify read data, so it needs to read a block from each drive in array to compare checksum. Thus read IOPS is penalized also. – DukeLion Aug 21 '13 at 07:27
  • 1
    I have a significant amount of experience with this, and can confirm for you that in most situations, RAIDZ is NOT going to outperform the same number of disks thrown into a traditional RAID5/6 equivalent array. Most the traditional RAID5/6 arrays gain IOPS performance (the one almost everyone cares about, even when they don't think they do) as you add spindles to the RAID set, whereas ZFS will not. In return, ZFS won't lose your data, and does not suffer from the 'RAID write hole' problem. Plus that whole snapshots, clones, compression and so on. And the ARC. And .. you get the idea. – Nex7 Aug 22 '13 at 02:13
  • 3
    I just want to be clear - ZFS is by its very design not going to win any performance wars. You can find alternative filesystems & volume managers (or combinations) that can outperform ZFS on the same hardware (ext4 & an Adaptec RAID card, for example). The only exception to this that I'm aware of comes from reads that are easily cached, where the ARC often gives ZFS a leg up on most the alternatives. But when it comes to writes? No. The effort ZFS spends on data integrity and how it handles writes is almost never going to win speed contests. The trade-offs are integrity and feature set. – Nex7 Aug 22 '13 at 02:18
  • @Nex7 ***Preach-it, brotha!!!*** – ewwhite Aug 22 '13 at 02:35
  • +1 for the 2 great links! – Totor Nov 07 '13 at 09:47