2

So I am stuck in a corner, I have a storage project that is limited to 24 spindles, and requires heavy random Write (the corresponding read side is purely sequential). Needs every bit of space on my Drives, ~13TB total in a n-1 raid-5, and has to go fast, over 2GB/s sort of fast.

The obvious answer is to use a Stripe/Concat (Raid-0/1), or better yet a raid-10 in place of the raid-5, but that is disallowed for reasons beyond my control. So I am here asking for help in getting a sub optimal configuration to be as good as it can be.

The array built on direct attached SAS-2 10K rpm drives, backed by a ARECA 18xx series controller with 4GB of cache. 64k array stripes and an 4K stripe aligned XFS File system, with 24 Allocation groups (to avoid some of the penalty for being raid 5).

The heart of my question is this: In the same setup with 6 spindles/AG's I see a near disk limited performance on the write, ~100MB/s per spindle, at 12 spindles I see that drop to ~80MB/s and at 24 ~60MB/s.

I would expect that with a distributed parity and matched AG's, the performance should scale with the # of spindles, or be worse at small spindle counts, but this array is doing the opposite.

What am I missing ?

Should Raid-5 performance scale with # of spindles ?

Many thanks for your answers and any ideas, input, or guidance.

--Bill Edit: Improving RAID performance The other relevant thread I was able to find, discusses some of the same issues in the answers, though it still leaves me with out an answer on the performance scaling.

Bill N.
  • 123
  • 1
  • 5
  • I would think there's overhead involved due to the calculation of parity... – Bart Silverstrim Apr 10 '12 at 16:28
  • 2
    What you are seeing is that the controller is running into saturation, which is to be expected. Adding a second controller and making use of logical volume management should help with that regard. If you need sequential access performance then not more than a dozen disks should be employed per current generation SAS HBAs. – pfo Apr 10 '12 at 17:21
  • Could you also describe your IO pattern? What performance target do you need to hit. You said 2GB/s writes but for what load? 2GB/s is probably not possible with even a few parallel threads plowing on the same file with that setup. – pfo Apr 10 '12 at 17:37
  • To answer pfo, The Write side IO pattern for this case is as bad as the array layout, and also outside my control: I have 500+ open file handles, each one is writing a sequentially growing file ( Picture a system log file, or network capture ). The writes are called in Stripe sized blocks. Each file is serviced by it's own thread, with about 32 threads active in given time slice (1/2 of the physical cores on the system are dedicated to file writing.) The Disk controller is on it's own IOH and so is not contending with any of the other IO on the box. – Bill N. Apr 10 '12 at 18:09
  • The Actual data rate target rate is 1.8GB/s on average for 12 hours, it will not be a constant rate, but can sustain over 2.5GB/s for up to 10min. – Bill N. Apr 10 '12 at 18:16

3 Answers3

6

ANY RAID 5 is sub-optimal, a 24-disk R5 array is just beyond stupid, I don't mean to be rude but most hardware array controllers won't let you create a 24 disk R5 array, think of how much data you may be losing without even knowing it. Also if you're doing any amount of any type of writing RAID5 or 6 aren't the way forward, in fact adding more spindles is likely to just slow things down.

Both from a performance and reliability perspective you NEED to convert this to RAID 10 as soon as possible, it's really the only way forward, anything else is just polishing a turd.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • 2
    >think of how much data you may be losing without even knowing it. Ok, I'll bite.... How am I losing data because of the parity setup, that needs an explanation. – Bill N. Apr 10 '12 at 16:44
  • 2
    I thought it was referring to the URE issue with RAID 5. You can have an unrecoverable read error on disk 1, disk 2 fails, you replace disk 2, begin rebuild, only to hit the URE on 1 and lose the array. – Bart Silverstrim Apr 10 '12 at 16:50
  • 2
    Chopper is right, rebuilds will take forever and nobody that manages storage in a professional setting should configure RAID-5 with 24 spindles in a single RAID group. – pfo Apr 10 '12 at 17:32
  • I know, but I spent all the good will I could, and all the political capitol I had to get a raid-10 solution to be considered. But now I must make this work as well as it can, long rebuilds and all. – Bill N. Apr 10 '12 at 18:18
  • 1
    Bill, the URE issue isn't about LONG rebuilds - it is about unrecoverable rebuilds. A DISK WILL FAIL. A READ ERROR WILL HAPPEN ON REBUILD. At 24 disks this is inevitable. YOU WILL LOSE ALL YOUR DATA!! DON'T DO IT!!! – Justin Higgins Apr 10 '12 at 18:47
  • In case you aren't familiar with the issue, having 13TB and 24 disks statistically guarantees that an Unrecoverable Read Error WILL OCCUR during a RAID rebuild. If you put that into production, you can consider your array already dead, and your data already lost. http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162 – Bigbio2002 Apr 11 '12 at 16:07
2

What are the different widely used RAID levels and when should I consider them?

RAID-5 at that scale could be problematic (rebuild times, increased chance of array failure). Your random write speed will not be reasonable at all (perhaps the bandwidth of a single disk), nor will it scale with spindles.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 2
    That is generally not true, RAID5 can beat RAID10 (especially with large spindle count) in write performance hands down if the file system block size is correctly aligned to underlying RAID stride (full stride writes) and IO patterns match said stride/block size. – pfo Apr 10 '12 at 17:35
1

Your limitation here is your raid controller. Random writes like a large cache, and parity-based RAIDs like a fast parity calculator. The more disks you add to a parity-based RAID array (like RAID 5 or 6) on a locally attached controller, the lower performance you'll see per spindle. Large direct attached storage (above 8 drives or so) tends to be raid 10 to avoid this issue. The only reason to use parity based RAID (5 or 6) is to benefit from a higher ratio of usable space. The downside is that if you lose a drive, the rebuild will take more time if the RAID was a large one.

No new hardware

If your hardware is fixed and you need to make this work as well as possible, then your best performance would be from RAID 10, however you'd lose half the available space. Also, while it seems more resilient at first because you can lose up to half the drives without a failure, you can still lose data if you lose the wrong two drives.

The other option for locally attached storage is ZFS. I don't claim to know its inner workings, but my understanding is that it will ignore the RAID card completely and work on the underlying disks. It might have a lower penalty on the parity for small writes if you configure it properly.

New hardware

If you have money to fix this problem, you would be well served to invest in a better RAID controller. Something with more cache and a faster processor. If you decide to use parity based RAID, you would be best served by making multiple RAIDs- preferably 3 groups with 8 spindles each.

Even better than this would be getting some external storage: something that has a storage controller which handles all the caching and parity calculation, and simply allows access to the storage via SAS or fibre channel.

Basil
  • 8,811
  • 3
  • 37
  • 73