14

With regard to IOPS, I have seen several sources on the web that suggest the IOPS of a given number of disks is simply the IOPS of a single disk multiplied by the number of disks.

If my understanding of IOPS is correct (and I'm not at all sure it is), I would have thought the reality would depend on - amongst many other factors - the RAID level. With RAID 1/10, all data is duplicated across at least two disks, reducing contention on a particular disk for some IO patterns. However, in striped RAID levels such as RAID 0/5/6, data is distributed rather than duplicated, meaning consecutive read requests could be for the same spindle, leading to blocking while the previous IO completes. Writes are even more contended.

I should add that I appreciate the reality is much more complex due to various optimisations and other factors. My question is really just driving at whether, at a very basic level, my understanding of what IOPS means is on the right track. It could be that my assertion that IOPS could even be influenced by RAID levels in such a way indicates a basic misunderstanding of the concept.

dbr
  • 1,812
  • 3
  • 22
  • 37
  • 4
    You're simplifying this to a point where you're excluding the impact of RAID controller cache, OS, the application's behavior, synchronous or asynchronous I/O and disk type. So... what are you looking for? – ewwhite Aug 21 '17 at 23:09
  • @ewwhite Sorry, I should have been clearer. I'm really hoping to see if the basic principle of my thinking is correct, rather than making real-world predictions. I appreciate that in reality things are greatly influenced by all sorts of optimisations and other complexities. There is a real-world situation in the background, but as is often the case when you're looking into something you're not all that familiar with, I've decided to go away and do some background learning so I feel a bit more comfortable with the basic principles. – dbr Aug 22 '17 at 19:16
  • I was tempted to ask whether anyone has any recommendations on good quality reading regarding the theory and concepts surrounding storage and its performance, but I didn't as I thought it may be considered an inappropriate question for ServerFault. There seems to be fairly little high-quality writing on the subject out there on the web that I've found so far - perhaps because it's quite a complex subject that few really understand fully. – dbr Aug 22 '17 at 20:49
  • RAID performance depends far more on controller hardware and implementation limits than the RAID level. E.g. RAID0, RAID1, RAID5 and RAID6 can theoretically employ all disks for long reads, so they can have the exact same read speed on an ideal controller. – Zac67 Aug 22 '17 at 21:00

2 Answers2

14

For HDD, IOPS are generally dominated by disk's access time, which is the sum of seek latency + rotational delay + transfer delay. As these variables strongly depend on the access patterns and have not-obvious interactions with the specific RAID layout (ie: stripe size) and controller (ie: read ahead tuning), any simple reply WILL BE WRONG.

However, lets try to have a ballpark figure. On a first approximation, IOPS guaranteed by a n-disk array should be n-times the IOPS of a single disk. However, both RAID level and data access pattern, by shifting weight between seek/rotational/transfer latency, drammatically changes this first-order approximation.

Lets do some examples, assuming 100 IOPS per single disks (a tipical value for 7200 RPM disks) and 4-disks arrays (except for RAID1, often limited to 2-way only):

  • a single disk is 100 IOPS, both reading and writing (note: due to write coalescing, write IOPS are generally higher than read IOPS, but lets ignore that for simplicity)
  • RAID0 (4-way striping) has up to 4x the random IOPS and up to 4x the sequential IOPS. The key word here is "up to": due to the nature of striping and data alignement, if the random accessed sectors prevalently reside on a single disk, you will end with much lower IOPS.
  • RAID1 (2-way mirroring) is more complex to profile. As different disks can seek on different data, it has up to 2x the random read IOPS but the same 1x (or slightly lower, due to overhead) random write IOPS. If all things align well (ie: large but not 100% sequential reads, a RAID controller using chunks/stripes concept/handling even in mirroring mode, read-ahead working correctly, etc.) sequential reads can sometime be up to 2x the single disk value, while sequential writes remain capped at 1x the single disk (ie: no speedup)
  • RAID10 (4-way mirroring) is, performance-wise, at half-way between 4-way RAID0 striping and 2-way mirroring. It has up to 4x the random read IOPS and up to 2x the random write IOPS. For sequential transfers, the RAID1 caveat applies: it sometime has up to 4x the sequential read IOPS, but only 2x the sequential write IOPS. Please note that some RAID10 implementation (namely Linux MDRAID) provide different layouts for RAID10 arrays, with different performance profile.
  • RAID5 (striped parity) has up to 4x the random read IOPS, while random write IOPS, depending on a number of factors as how large the write is in respect to the stripe size, the availability of a large stripe cache, the stripe reconstruction algorithm itself (read-reconstruct-write vs read-modify-write), etc, can be anywhere between 0.5x (or lower) and 2x the IOPS of a single disk. Sequential workloads are more predictable, with 3x the IOPS of a single disk (both for reading and writing)
  • RAID6 (striped double parity) behaves much like its RAID5 brother, but with lower write performances. It has up to 4x the random read IOPS of a single disk, but its random write performance is even lower than RAID5, with the same absolute values (0.5x - 2x) but with lower real word average. Sequential reads and writes are capped at 2X the IOPS of a single disk.

Let me repeat: the above are simple and almost broken approximations. Anyway, if you want to play with a (severely incomplete) RAID IOPS calculator, give a look here.

Now, go back to the real world. On real world workloads, RAID10 is often the faster and preferred choice, maintaining high performance even in the face of a degraded array. RAID5 and RAID6 should not be used on performance-sensitive workloads, unless they are read-centric or sequential in nature. It's worth noting that serious RAID controllers have big power-loss protected writeback cache mainly to overcome (by heavy stripe caching) the RAID5/6 low random write performance. Never use RAID5/6 with cache-less RAID controllers, unless you really don't care about array's speed.

SSD are different beasts, thought. As they have instrinsically much lower average access time, parity-based RAIDs incour a much lower performance overhead and are much more viable option than on HDDs. However, in a small random-write centric workload, I would use a RAID10 setup, anyway.

TessellatingHeckler
  • 5,676
  • 3
  • 25
  • 44
shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • *Never use RAID5/6 with cache-less RAID controllers, unless you really don't care about array's speed.* You can get away with this if you really know what you're doing and have tight control of your IO pattern. If you are doing nothing but sequential IO that is matched to the array's stripe size, you can get away with using cache-less RAID5/6. And cache can't save performance if you do enough random, small-block write operations to a RAID5/6 array, although the value of "enough IO operations" that kills performance can be a huge number for a really good RAID controller. – Andrew Henle Aug 23 '17 at 09:49
  • @AndrewHenle Sure, if only issuing sequential reads/writes which are stripe-aligned, even a cacheless controller in RAD5/6 mode can give you good results. However this is a very narrow use pattern (ie: streaming and backups). For general purpose workload, a cacheless controller combined with any parity RAID will be really slow. Some controllers even *require* a powerloss-protected writeback cache to let you create a parity RAID. – shodanshok Aug 23 '17 at 10:52
  • I was thinking more about the admins who wonder why their corporate mail storage 21-drive RAID6 array with a 19-MB-because-bigger-must-be-faster stripe size is slow.... – Andrew Henle Aug 23 '17 at 11:17
  • What do you recommend for sequential read only of big files? Like streaming? – Freedo Mar 08 '21 at 01:52
  • For a sequential read centric workload, RAID6 should be the best setup (*if and only if* you don't need any redundancy you can consider RAID0 also). – shodanshok Mar 08 '21 at 14:41
2

It's just a matter of definitions. You can measure IOPS at different levels in the system and you will get different values. For example, suppose you have two mirrored disks and you are writing as fast as you can. The IOPS going to the disks will be twice the number of IOPS a single disk can handle with a similar write load. But the IOPS going into the controller will be equal to the number of IOPS a single disk can handle.

Usually what we care about is how many logical IOPS we can get into the array and we don't particularly care what's happening at the disk level. In that case, you are correct and the IOPS depends on the RAID level, the number of disks, the performance of the individual disks and, in some cases, the specific characteristics of the operations.

David Schwartz
  • 31,215
  • 2
  • 53
  • 82