11

does adding heaps of drives to a raid 0 increase performance? i know that two drives in a striped raid will usually be faster than a single drive but will i notice a difference in performance between say, 2 drives in a striped raid and 8? is there a general limit to the number of drives in the raid before you really don't get any more benefit?

a similar question has been asked here

Does adding more drives to a RAID 10 Array increase performance?

but i'm really asking if adding many drives to a raid 0 has improvements over just adding say 2 or 4. does the performance keep increasing?

Peter Cullen
  • 237
  • 1
  • 2
  • 9

2 Answers2

11

In theory yes, more drives in a raid0 would lead to higher performance because the load is shared over more drives. However in practice you would be limited by the the bandwidth of the raid controller, the CPU and memory performance and similar. The performance increase would not be linear, that is 4 disks is not exactly twice as fast as 2 disks.

In any reasonably modern system with a raid controller, or even using a software raid with linux' mdadm, using 8 drives will be faster than using 2 and you should not be held back by the rest of the system's performance. The CPU, raid and/or disk controller, memory, it all should be able to handle it. You may see increased use of system resources the more drives you add. Especially if you use the onboard SATA controller in a softraid combination. But nothing that would really hinder overall usability. If using linux you may want to use a kernel that has been configured without "preempt" in order for server oriented tasks to get preference over user responsiveness.

https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO

Of course the more drives you add, the higher the chance one of them fails and your whole raid is destroyed. I would expect a raid0 of 8 drives to not last more than a year or two, if you're lucky. A raid0 of 16 drives would be asking for trouble and then I'd consider a raid10, it would still be fast enough and you have less to worry about.

As for how many drives would max out a system's resources I wouldn't know unless I had detailed system specs. I think you'd be limited more by the failure rate, if you go over about 16 disks (I rather not like to think about it).

Naturally you'd only use the raid0 for data that can be lost any time without problems. It would work great for things such as a build server, or scratch space for large scientific computations. In fact those scenarios is what I often used a raid0 for and it is a great way to squeeze a bit more life out of a bunch of older, lower capacity and inexpensive disks that would otherwise have been collecting dust. You can even mix sizes, at least with mdadm.

If using mdadm it may be worth considering to just use a raid10 as in certain configurations it can get near the performance of a raid0, that is read performance of a raid0 and already improved write performance over other raid levels (except raid0). You would get better redundancy than other raid levels, with only a slight speed penalty compared to a raid0. That would be the best of both worlds, you don't find that often.

https://en.wikipedia.org/wiki/RAID#Non-standard_levels

Linux MD RAID 10 provides a general RAID driver that in its "near" layout defaults to a standard RAID 1 with two drives, and a standard RAID 1+0 with four drives; though, it can include any number of drives, including odd numbers. With its "far" layout, MD RAID 10 can run both striped and mirrored, even with only two drives in f2 layout; this runs mirroring with striped reads, giving the read performance of RAID 0. Regular RAID 1, as provided by Linux software RAID, does not stripe reads, but can perform reads in parallel.

As suggested in the comments, mixing sizes with mdadm will not give a speed increase if you utilise all disk space as opposed to letting the smallest disk define the size of the array.

Also seek time will not improve in a raid0 and can even become a bit slower. For an SSD based raid0 the seek time would be so small (between 0.08 and 0.16 ms https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics#cite_note-HP_SSD-6) it wouldn't matter much I expect.

aseq
  • 4,550
  • 1
  • 22
  • 46
  • If you want to mix sizes, then you cannot apply RAID0, at least to use all the space those disks have. You have to use JBOD, which doesn't increase performance. – Tero Kilkanen May 27 '15 at 23:31
  • You can mix sizes using mdadm, it's very flexible, mdadm even allows you to configure a 3 disk raid10. I wouldn't expect you can mix sizes in raid controllers, those are less flexible, but faster. – aseq May 27 '15 at 23:34
  • 1
    I checked this, and if you want RAID0, then the smallest device of the array defines the size of the complete array. That is, if you have 100GB, 200GB and 300 GB drives, you'll get a 300 GB RAID0 array and 100GB and 200GB free space to use for other purposes. In Linear mode, you get the complete capacity of all the devices, but not the parallel performance. – Tero Kilkanen May 27 '15 at 23:46
  • 1
    That sounds about right yes. – aseq May 28 '15 at 00:03
  • 2
    With rotational media, isn't there also an issue of seek time vs transfer time? Adding more disk spreads the amount being read/written across more platters (each has to do less == faster) but they all still have to perform a seek operation (not reduced by adding more drives). So, depending on the type of operations you're performing (ie. lost of small reads vs a few large reads), increasing the transfer speed (by adding more drives) could make a small or large difference. – Molomby May 29 '15 at 00:39
  • Yes good point. The speed gain is not linear, i.e. 4 drives is not twice as fast as 2. – aseq May 29 '15 at 01:46
1

It depends from the workload, but IMHO yes adding 2 additional disks to existing 2 disks array should give better overall performance.

You need to realize where the bottlenecks are:

  • CPU - how much data flow CPU can handle,
  • bus/controller - how much data it can carry,
  • SSD/HDD - how much data it can give/take.

Let's assume that there is a Linux software RAID, then adding two additional disks MAY result in:

  • ~ two times shorter access time to big enough block of data, which results in;
  • ~ double IOPS,
  • ~ double throughput, assuming that the controller has sufficient bus and CPU can handle the traffic.

*~ this is never two time boost in the following factors, always less 10-20%. It looks like more or less linear. Please don't treat it as an authoritative answer, I didn't do any studies about it.

Michal Sokolowski
  • 1,461
  • 1
  • 11
  • 24