Should I use "Raid 5 + spare" or "Raid 6"?

12

1

What is "Raid 5 + Spare" (excerpt from User Manual, Sect 4.17.2, P.54):

RAID5+Spare: RAID 5+Spare is a RAID 5 array in which one disk is used as spare to rebuild the system as soon as a disk fails (Fig. 79). At least four disks are required. If one physical disk fails, the data remains available because it is read from the parity blocks. Data from a failed disk is rebuilt onto the hot spare disk. When a failed disk is replaced, the replacement becomes the new hot spare. No data is lost in the case of a single disk failure, but if a second disk fails before the system can rebuild data to the hot spare, all data in the array will be lost.


What is "Raid 6" (excerpt from User Manual, Sect 4.17.2, P.54):

RAID6: In RAID 6, data is striped across all disks (minimum of four) and a two parity blocks for each data block (p and q in Fig. 80) is written on the same stripe. If one physical disk fails, the data from the failed disk can be rebuilt onto a replacement disk. This Raid mode can support up to two disk failures with no data loss. RAID 6 provides for faster rebuilding of data from a failed disk.


Both "Raid 5 + spare" and "Raid 6" are SO similar ... I can't tell the difference.

When would "Raid 5 + Spare" be optimal?

And when would "Raid 6" be optimal"?

The manual dumbs down the different raid with 5 star ratings. "Raid 5 + Spare" only gets 4 stars but "Raid 6" gets 5 stars. If I were to blindly trust the manual I would conclude that "Raid 6" is always better. Is "Raid 6" always better?

Trevor Boyd Smith

Posted 2010-11-19T22:24:11.450

Reputation: 2 093

1ServerFault has a good discussion on this. – Brian – 2010-11-19T22:36:10.317

1Whatever you end up doing, only raid with a raid controller, not with the on-board soft controller that comes with your mobo. If your mobo goes out, you are asking for trouble. – sound2man – 2010-11-20T00:05:19.893

The raid is being down by a hardware controller (lol i have heard too many things against software raid controllers). – Trevor Boyd Smith – 2010-11-20T18:17:16.777

Answers

17

In short:

  • If safety is your main concern then go with RAID6 as it can survive any two drives failing at the same time. If a drive fails in an R5+spare arrangement you are not safe from another failure until the spare has been brought up to speed which could take quite some time with large drives (and it is not unheard of for a drive that has been powered down for ages, such as your spare, to fail to spin up when finally called upon).

  • If performance is king, go with 5+spare as the write performance will be better when the array is not in a degraded state - though the performance difference between R5 and R6 is significantly smaller than the difference between R5 and other solutions if you have a good controller (i.e. once that makes a partial block write operation "two/three concurrent reads then parity calc then two/three concurrent writes" most of the time rather than "read-then-read(-then-read)-then-parity-calc-then-write-then-write(-then-write)" which is what some very cheap controllers and software RAID may do.

Edit: I missed a potentially important point first time around:

  • If power consumption is a concern, then R5+spare will have an extra advantage if your controller keeps the spare drive powered down until needed.

David Spillett

Posted 2010-11-19T22:24:11.450

Reputation: 22 424

I'd be curious to know when, if ever, the power draw of a single extra drive is really going to be a "concern" in comparison to everything else in the data center / server room / etc – warren – 2018-01-04T18:18:45.207

A single drive in a single machine, probably not. But in colo where you get X-amps-per-rack and pay a lot for any excess (or excess is simply not permitted - sometimes if you go over you go dark), it could be noticeable. Power "consumed" is a double whammy too: it is converted to noise and heat and you end up needing more power to move the heat away. And for a whole cage or larger set of kit the total draw of an extra drive per compute unit soon adds up to something a sufficiently picky accountant might notice. – David Spillett – 2018-01-05T11:27:21.910

Most well written/concise. (States the obvious pros/cons in the first two words of each bullet point... very very good). – Trevor Boyd Smith – 2010-11-21T02:58:06.727

7

RAID 5 + hot spare:

  • on equal controller hardware better performance than RAID 6
  • you cant lose 2 disk at the same time. when you lose a disk, there's a rebuild time (with the hot spare) in which you have no redundancy. Anything which fails in this time creates a complete loss (short of sending everthing to a good data rescure firm and pay really $$$$)

RAID 6:

  • worse performance than RAID 5 (dependend on controller it can range from very noticable to virtually no difference)
  • you can lose 2 disks at the same time

For any RAID 5 or 6 you have to be carefull to use disks which are not from the same production run. It can happen (I've seen it!) that after a single fail upon rebuild the next disk(s) fail due to the increased stress. Disks from the same run have the exact same firmware and probably very similiar physical properties.

Edit: What to choose

(This also depends on the performance requirements of the server and the tolerable risk.)

If the servers' environment is pretty nice for hardware (colo, climatized etc.), you'll be OK with RAID5 + hot spare.

If the environment makes it more likely that more than one disk fails within short time (vibrations, humidity, dirt), then go for RAID 6.

Always also have an adequate backup and test recovery.

Edit 2: Decent RAID controllers have scrubbing, which verifies periodically all sectors.

knitti

Posted 2010-11-19T22:24:11.450

Reputation: 871

+1 for "have an adequate backup and test recovery". That's the *FIRST* thing everyone should have before they start worrying about RAID levels. – warren – 2018-01-04T18:19:31.583

3

RAID5 uses one parity stripe. It is necessary to calculate the Reed Solomon error correction and write two stripes for RAID6 vs. one for RAID5. RAID5 is used for intense database applications where storage is huge because of the cost of RAID10. RAID5 cost varies from 67% to 94% disk availability where RAID10 is 50%(much higher storage costs) While RAID6 has lower read latency by a very small amount due to rotational latency, RAID6 is between 25 and 31% slower on writes due to the calculation of error correction and the additional writing of the parity bit.

Using the mean time between failure (MTBF) for the drives, the probability of two drives failing one right after another or at the same time is about (0.1% x 0.1%)*12 or 0.001 x 0.001 * 12; if you have 1000 drives running then you will average losing ~1.2 drives per year. Two drives will fail one right after the other about every 8.3 years. Now because drive failure is not a Poisson distribution due to the heavy loads on the drive during rebuild, a failure of a second drive is more likely to occur during this period, and the distribution is closer to a Gamma distribution with slightly higher values after a failure occurs.

The bottom line is, performance for RAID5 is superior to RAID6 on writes and for DB application - far better. For a mostly read application such as a web server, it makes no difference and you should use RAID6. The cost benefits of using RAID5 over RAID10 are huge for large storage. If you can afford the overhead, use RAID10 for highly disk-intensive applications. RAID10 will always perform better.

The biggest bottom line missed is RAID is NOT backup, but a way to limit downtime by providing redundancy. If the data is critical, you should be backing it up (and testing your recovery process).

If one RAID array of 10 2TB SAS drives fails, recovery will cost thousands of dollars and take weeks to recover, if it can even be done.

All RAID arrays eventually fail!

Dr. Bombilious

Posted 2010-11-19T22:24:11.450

Reputation: 31

1

Speaking strictly from a data integrity viewpoint, yes. You can safely lose any two drives, although it is a rare occurrence to lose two together short of severe physical trauma to the system.

Financially, not quite as much. The hot spare can be powered down until needed, which means that it doesn't use power and incurs no wear.

And as always, RAID is not a replacement for a proper off-site backup plan.

Ignacio Vazquez-Abrams

Posted 2010-11-19T22:24:11.450

Reputation: 100 516

1

Have you considered 10? If you have enough disks for raid 6, you've got enough to do a 10 volume. In most cases 10 is both faster and more redundant (at the cost of some disk space).

Joel Coehoorn

Posted 2010-11-19T22:24:11.450

Reputation: 26 787

10 only supports 4 disks. so raid 10 is not an option IMO. – Trevor Boyd Smith – 2010-11-20T18:15:40.913

1@Trevor Raid 10 supports any even number of disks >= 4. If you can do raid 6, you can do raid 10. – Joel Coehoorn – 2010-11-20T22:45:52.700

1

These answers seem incorrect because they are based on theoretical drive performance ONLY. Consider, if you have a RAID controller with 1 GB of cache, then the write (usually under normal load - not some massive non normal high load scenario) is immediate from the perspective of the user or application - it went to memory and then the 'actual' writing occurs at the performance of the drive.

However, reading cannot be 'faked' (sped up with a cache) unless the same data has recently or habitually already been loaded. Raid 6 is better for read and is more tolerant (2 versus one drive). Raid 5 is slower writing and really slow when rebuilding.

So, while RAID 5 would be slow in actual writing, it will be hidden with a good raid controller - where the write occurs in memory from the perspective of the user/application. However, Raid 5 is slower reading than raid 6 and that will not be improved with a controller unless the data has already been loaded or an algorithm keeps a record of repeated reads. In real life - the raid 6 wins.

In conclusion, Raid 5 writing is slow but hidden with a good controller and that makes raid 5 or 6 basically the same with 'perceived' performance in writing (there are some exceptions). However, Raid 6 reads faster and controllers wont likely help in a real life workload to improve read performance. Now add that Raid 6 can take two failures and Raid 5 + 1 only one it gets easy to choose Raid 6 as a better option: don't forget the rebuilding on Raid 5 is really slow too. I have also learned that Raid 6 drives are used (thus tested right way) and drives that fail tend to fail very quickly. Once an array is up for more than 30 days, it tends to last for years. A hot spare is untested and may actually fail immediately right when its needed.

Trevor

Posted 2010-11-19T22:24:11.450

Reputation: 19

0

These are the facts of the case, and they are undisputed (by anyone who knows what they are talking about):

  1. RAID5+hotspare is, literally, the worst possible RAID choice you can choose.
  2. RAID10 should be the default choice if you care about your data (meaning you depend on it, for example, to keep your business going).

If you consider all possible RAID options, there is no case in which RAID5+hotspare is the best choice, primarily because if you have RAID5+hotspare, then it means you have 4 drives, and with 4 drives you can do RAID6, or even better, you can do RAID10.

With 4 drives you get the same usable storage out of all choices (R5+HS, R6, R10).

If your goal is performance, then RAID10 will be superior to RAID5 and RAID6.

If your goal is safety, RAID6 or RAID10 are superior to RAID5 with or without a hotspare. It's debatable which one is safer (6 vs 10). RAID6 can sustain 2 drive failures, but because of unrecoverable read errors (URE's), it's also possible that a single drive failure in a RAID6 will kill the entire array.

RAID10, because it is not parity-based, does not have the same problem with UREs. If a parity RAID (R5, etc) loses a drive, and then encounters a URE, the entire array is lost. With RAID1 or RAID10, if a drive is lost, and then a URE is encountered on the mirror disk, only the unreadable sector is lost.

See here for a detailed explanation of why RAID5 is the worst possible choice. Also see here for a list of reasonable RAID choices by number of drives. Notice that in no case is RAID5 the best choice (regardless of hotspare).

user1594322

Posted 2010-11-19T22:24:11.450

Reputation: 109

2I disagree. RAID5 has its uses. (e.g. when a budget is tight and you really need diskspace). And since RAID does not replace a backup surviving one disk failure is plenty to tide you over till 5 PM, at which point people leave the office and you do emergence maintenance. – Hennes – 2013-09-23T22:11:22.093

There is a difference between "the best choice" and "the best choice you can afford". RAID5 is never the best choice, ever. People come here to get the best answer, and people should leave here knowing that RAID5 is always less than the best. It's mathematically provable that in some cases RAID0 is more reliable than RAID5. That's how scary RAID5 is. In many cases, the RAID5 may not make it to 5 PM. There is a big difference between theory and the real world when it comes to RAID5. See here

– user1594322 – 2013-09-25T00:26:10.100

It's not at all clear to me why you say that a RAID10 does not have the same problem with URE's. With a four-drive RAID10 setup, if you lose one drive and suffer a URE on its corresponding mirror, you're equally hosed. – ChrisInEdmonton – 2013-10-31T01:37:32.730

If RAID10 has a failed drive, and then has a URE on the surviving drive, you only lose the unreadable sector, not the entire array. Updated the answer. – user1594322 – 2013-11-01T02:56:46.110