1

I am rather new to databases. I was reading up on RAID Storage and it seems that the consensus is that RAID 10 is

RAID 10 is best is terms of performance and redundancy

However at the end article , the author went on to state

Independent pairs of RAID 1 is superior to RAID 10 provided application knows how to evenly distribute data across multiple volumes

The author:George Ou did an analysis to proof why independent pairs of RAID 1 is superior to RAID 10 though i cant really understand as i am quite new to databases

However another author : Robin did his own analysis and rebuts George Ou analysis.

I am confused over all these analysis which is totally out of my depth.

These are my questions

  1. Is it really true that Independent pairs of RAID 1 is superior to RAID 10 provided application knows how to evenly distribute data across multiple volumes

  2. In practice, it is easy to create an application that is able to even distribute data across multiple volumes and how is it done ??

  3. Can someone provide a simplified explanation regarding the above 2 points??

longneck
  • 22,793
  • 4
  • 50
  • 84
Computernerd
  • 139
  • 7
  • @ewwhite This question is not about RAID levels. It is as about a "RAID" vs an "Application" using multiple drives itself. I did not find an answer in http://serverfault.com/questions/339128/what-are-the-different-widely-used-raid-levels-and-when-should-i-consider-them and wonder why this question is marked as "duplicate". – Veniamin Dec 29 '13 at 17:52
  • I vote to reopen. This is not a "explain me raid levels" question - this is more detailed and specific. It should be allowed to stand. – TomTom Dec 29 '13 at 18:01

2 Answers2

3
  1. yes, because the failure of a Raid pair only means part of the data is gone. In a Raid 10, a raid pair failure means that all the data is unreadable. That said, access to the data is also slower.

  2. not necessarily. Basically you must do that in the application and it depends on the application how complex it is - which is a programming, not an admin question. It may be easy or hard - it totally depends on the application. In many cases loss of a part of the data means stopping everything anyway and reloading a backup.

  3. How much easier than that? (which I consider as easy as it gets for an admin - if you are not on that level, the question should be re-asked on superuser.com).

longneck
  • 22,793
  • 4
  • 50
  • 84
TomTom
  • 50,857
  • 7
  • 52
  • 134
  • +1 Great answer... I'd add to #2 that programming intelligent distribution of data is incredibly hard *usually*. There might be applications where that's easy, but for most database applications that's you're going to be pandering to particular use cases rather than the general case, if you're lucky. – Chris S Dec 29 '13 at 17:59
  • Yes. If you store images for example (social site, for example) then you can just have numbered "buckets" and store the bucket no. in the db - and a bucket goes to a file system. But that is a very special case. – TomTom Dec 29 '13 at 18:00
  • 3
    THe other problem is load distribution. If you have a Raid 10, then the whole IO budget is available regardless how the data and requests are distributed. Split them and you may end up with a lot of the requests hitting the same Raid 1 - slowing them down. – TomTom Dec 30 '13 at 06:44
  • Used to work with some "makes lots of drives and use tablespaces a lot" dbas. Put all my stuff on one big RAID-10 under linux/pgsql and stomped their oracle db server (similar hardware) right into the ground. – Scott Marlowe Jan 16 '14 at 23:39
0

An application distributing data over drives can always be superior to RAID simply because It has more info for optimization decisions. MS SQL is a good example since It shows better performance on bare drives vs. running on its combination as RAID0 (concerning your case RAID10 is RAID0 combination of RAID1).

But practically It is not so obvious:

First, George Ou explored motherboard integrated raid. With dedicated power RAID card his results may be different considerably.

Second, with RAID10 you can use ALL your disks. If you make disks to be managed by an application then you need to leave something for system. Say, in 8-drive server you may need to choose between 8-drive RAID10 and RAID1 for susyetm plus 3*RAID1 for an application. And in this case RAID10 may produce better results.

And third, with RAID10 one has flexibility that is often preferable then a little gain of performance. Volume management, space reallocations, snapshotting, block-level replication an other cool features are points to sacrifice then an application manages drives itself.

Veniamin
  • 853
  • 6
  • 11