6

I have this discussion with developers quite often. The context is an application running in Linux that has a medium amount of disk I/O. The servers are HP ProLiant DL3x0 G6 with four disks of equal size @ 15k rpm, backed with a P410 controller and 512MB of battery or flash-based cache. There are two schools of thought here, and I wanted some feedback...

1). I'm of the mind that it makes sense to create an array containing all four disks set up in a RAID 10 (1+0) and partition as necessary. This gives the greatest headroom for growth, has the benefit of leveraging the higher spindle count and better fault-tolerance without degradation.

2). The developers think that it's better to have multiple RAID 1 pairs. One for the OS and one for the application data, citing that the spindle separation would reduce resource contention. However, this limits throughput by halving the number of drives and in this case, the OS doesn't really do much other than regular system logging.

Additionally, the fact that we have the battery RAID cache and substantial RAM seems to negate the impact of disk latency...

What are your thoughts?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • What are the applications? Something like SQL would be different that just file/print, etc. – Dave M Jan 06 '11 at 16:42
  • It's financial trading software, so network-heavy, with the main writes to the data partition being application logging. The database size with logs would never be more than 8GB. – ewwhite Jan 06 '11 at 16:53
  • 1
    I go with your arguments, why have 2 spindles sat doing nothing much when you could be using them to serve data faster + doing it without any loss of resilancy. Bah! developers ! :-) – AndyM Sep 09 '11 at 12:08

3 Answers3

7

My thoughts are that performance talks and bullspit walks.

Since you're discussing recreating the array, why not do it both ways and run test load on it, then graph the results?

edit

If, as you say in your comment, that real life shows that performance doesn't depend on the underlying RAID level (and you actually do have the numbers to back it up), then I would recommend going with the level that will continue to give you the greatest performance as you scale, which is almost always going to be RAID-10.

If you're not going to expand, then it literally doesn't matter, since you apparently get the same performance from either option, and you're painting the bikeshed.

Matt Simmons
  • 20,218
  • 10
  • 67
  • 114
  • 1
    +1 a real test would be the best way to find out – Dave M Jan 06 '11 at 16:44
  • A bonnie++ test will show the greater sustained read and write throughput on the RAID 10 setup. The real-life patterns of access by the application show that it doesn't matter either way, since the app's IO needs fall way below the capabilities of the system - plus caching and RAM effects... so I feel like the notion of separating the OS from data spindles in a 4-disk setup is a carryover from older times/slower systems.Now, if we were talking 10 disks, the separation makes more sense for other reasons. – ewwhite Jan 06 '11 at 16:44
  • My .02...if the measurements show that there is no contention under a range of conditions from normal to peak, then mitigating a theoretical resource contention is worth about as much as bullspit as Matt points out. ;) So, unless there are other reason for splitting things out... – damorg Jan 06 '11 at 17:07
  • Can you think of a situation where it DOES make sense to split this number of drives into two pairs? – ewwhite Jan 06 '11 at 17:34
  • 1
    I can't think of a compelling technical reason to split 4 disks into 2 mirrors (esp. just 2 mirrors and no spares). That said, there can always be reasons: a build standard you can't change or a requirement to physically isolate app/data from system disks or maybe even some customers (or devs) who want it like they want it and it's not possible or worth it to change their minds. – damorg Jan 07 '11 at 02:33
  • +1 for your answer and +9 for painting the bikeshed reference; unfortunately, an old thread so impressively current. – Marcelo Scofano Diniz Jan 13 '21 at 19:39
1

Since nobody presented any real world data regarding this question, I would like to point out to an interesting article regarding the same exact question!

RAID 1+0 or double RAID 1

From the article:

"As much as we love raw sequential throughput, it’s almost worthless for most database applications and mail servers. Those applications have to deal with hundreds or even thousands of tiny requests per second that are accessed at random on the hard drives. The drives literally spend more than 95% of the time seeking data rather than actually reading or writing data, which means that even infinite sequential throughput would solve only 5% of the performance problem at best."

GEMI
  • 111
  • 2
  • It's an interesting notion. I feel that the argument falls apart when the data sets grow beyond the boundaries of the a single (enterprise) drive's capacity and necessitates striping. What if the working set of data is 1.5 Terabytes? But otherwise, it's still an interesting observation. – ewwhite Sep 08 '11 at 08:37
  • I really don't know much when it comes to large data set's, but at least for an MSSQL server this is not applicable, since you can split a large database across several disks fairly easily. – GEMI Sep 08 '11 at 09:11
0

Food for thought: If you decide to go with 4 disk RAID-10, then at any point in the future, the developers can point the finger back to you when a performance issue arises. If you go with two sets and then a performance issue arises, then you might have the option of changing over to a four disk RAID-10, depending on how complex this conversion would be.

jftuga
  • 5,572
  • 4
  • 39
  • 50