47

Just curious, I have 6 x 1TB 7200RPM Near Line SAS for my new server. I can either configure it as RAID5+1 Hot Spare or RAID6.

What should I choose?

Raptor
  • 991
  • 3
  • 16
  • 36
  • 1
    I speculate that 5 years from now... many people will be asking.. what's RAID? It's going to be heading out the door very soon but still relevant for today. – hookenz Apr 28 '14 at 20:44
  • 2
    So because many people state opinions instead of facts, we close the topic and don't let the facts get discussed? That seems wrong. There is hard math that shows why RAID 6 is better for safety, capacity and how speed is affected. No opinion involved. – Scott Alan Miller Mar 04 '17 at 12:34

10 Answers10

36

You have disadvantages and advantages with each approach; it depends on why you're using RAID. Most people use it for availability. They don't want a drive to die and end up having to take their system or server down. In that, you don't use RAID 5. I learned it the hard way and hammer this point home with every RAID-related question I get into on SF.

Why? Because as drives are getting larger, there's more tolerance for URE, unrecoverable read errors. We had it happen and it isn't what you want to discover in the middle of a rebuild. Scenario: RAID system with 3 drives. We got an alarm on our Dell with a hardware PERC card that drive C died. Order new drive, swap it out, no problem. In the middle of the rebuild, it died.

According to the diagnostics, there was a "bad spot" on drive B. The system had a silently failed on that drive repeatedly, and now that it was rebuilding the data, it couldn't read that spot, and no matter how many times we ran the repair even off the controller directly and it each time said everything was fixed, it wouldn't rebuild. So we have one dead drive and one drive that couldn't read from a spot...we end up replacing 2 drives and restoring from backup.

Lesson: RAID isn't a backup, and RAID 5 is no longer an availability option for larger drives.

If you're looking to increase speed or increase storage sizes, then you can balance that into your decision. You need to define your needs in terms of your needs and goals, not in terms of "I need RAID, which do I use?"

Bart Silverstrim
  • 31,092
  • 9
  • 65
  • 87
  • 1
    Drives meant for RAID operations typically fail their error correction very quickly and report the failure to the controller. The idea is to avoid situations like you describe above -- the bad block gets remapped and the data recovered from parity rather than the disk getting marked as bad because it "hangs" for 45 seconds trying to recover that block. Western digital even has a utility to change the error recovery behavior of their disks, though the utility isn't "officially" released. – chris Mar 02 '10 at 13:15
  • 7
    I could not follow the argument. Are you saying use RAID 6, or RAID 5 ? – Martin Mar 02 '10 at 13:25
  • @martin: I'm commenting on the concern that certain types of disk failures are dealt with differently depending on the target design of the disk itself. Martin described a multi-disk failure as a result of a "bad spot" on a disk, which under some situations won't (and shouldn't) cause a full array rebuild. In the process of the array rebuild, another disk failed, resulting in a bad day. – chris Mar 02 '10 at 13:29
  • @chris - Sorry for the confusion - I was not commenting on your comment. (It was not there when I posted.) My question was directed at Bart: I could not figure out if he was saying RAID 5 is a must-have, or whether RAID-5 is not enough and only RAID 6 would do. – Martin Mar 02 '10 at 14:40
  • @Martin: RAID 5 is bad for using large disks today. Avoid it if you are looking for high availability (keeping servers running). – Bart Silverstrim Mar 02 '10 at 14:43
  • @Chris: I don't know what drives you're thinking of, but this is a known issue, having bad spots that *aren't* reported to the controller just as a matter of design. And our failure wasn't a 45 second hang. The drive consistently reported a bad block, the controller tried fixing it, said it's fixed, and it failed to rebuild still, and there was never an error reported in the logs until we initiated the rebuild. Here's an article with information: http://www.tomshardware.com/news/RAID-5-Doomed-2009,6525.html – Bart Silverstrim Mar 02 '10 at 14:46
  • Possibly better info: http://blogs.zdnet.com/storage/?p=162 http://subnetmask255x4.wordpress.com/2008/10/28/sata-unrecoverable-errors-and-how-that-impacts-raid/ http://blogs.techrepublic.com.com/datacenter/?p=1752 http://blogs.zdnet.com/storage/?p=483 – Bart Silverstrim Mar 02 '10 at 14:51
  • There's other data out there on the issue if you google for URE RAID or some variation thereof. I'm vocal because it happened to me on a Dell server with a hardware PERC controller supposedly doing everything "right". Fortunately we also managed to have a backup that can rebuild from bare metal as well. There was no indication that the second drive had a bad spot, not with chkdisk, not with the OpenManage diagnostics, nothing until it was time to rebuild that array. – Bart Silverstrim Mar 02 '10 at 14:53
  • Again, the OP needs to consider why he's using the RAID...availability of the server, speed of data access, more storage...and factor that into types. But I'd avoid RAID 5 with today's storage sizes. – Bart Silverstrim Mar 02 '10 at 14:54
  • @bart -- Thankfully I've never run into this sort of situation. What I can say is what I *think* is suppose to happen with a raid and a disk with a bad block -- in a raid / SAN environment the disk should report the error to the controller and the controller gets the data from elsewhere and tells the disk to mark the block as bad and move on to something else. The world changes when you all of a sudden have *only* bad blocks, but that isn't exactly what you described here. Also, many enterprise SANs will scrub disks while the array is idle. I'm not sure a PERC is enterprise or just "prosumer" – chris Mar 02 '10 at 17:03
  • 2
    @Chris-I don't know what the differentiation would be anymore, but PERCs are hardware based, onboard cache, integrated management tools, hot swap, etc. and while Dell isn't necessarily charging ten grand for the servers they're fairly popular in the enterprise environment. The drives are supposed to report errors; that's how you get error messages and alerts. But URE's are something that happens with drives and they don't seem to be aware of it until doing something like...oh...rebuilding an array. Then suddenly "Hey! There's a problem here too!" – Bart Silverstrim Mar 02 '10 at 17:24
  • @Chris-I had it happen to me in a rebuild situation. Then I found articles saying that manufacturer fault tolerances, density of drives, etc. just increase the odds of it happening, so as drives get bigger...you need redundant redundancy, and apparently even RAID 6 will have trouble as drive manufacturing tolerate more errors and storage gets more dense, from what some articles say. – Bart Silverstrim Mar 02 '10 at 17:28
  • -1, sorry. Interesting discussion, and I couldn't agree more on the "RAID is no backup" thing, but it doesn't answer the original question. – Massimo Jul 28 '12 at 14:28
  • Is this true in 2019 regarding SSD? – hans Apr 16 '19 at 09:28
16

Use RAID6. Read "Why RAID 6 stops working in 2019" by Robin Harris on ZDNet.

Cristian Ciupitu
  • 6,226
  • 2
  • 41
  • 55
tore-
  • 1,386
  • 2
  • 10
  • 18
  • 4
    And what after 2019? – Mat May 24 '17 at 15:11
  • 2
    @Mat Hi from the year in which RAID 6 stops working. ;-) This really should have been in the article, but you'll have to look into [erasure coding](https://en.wikipedia.org/wiki/Erasure_code), typically implemented with polynomial oversampling over [GF(256)](https://en.wikipedia.org/wiki/Finite_Field). This means (almost) arbitrary redundancy levels can be used, e.g. 9+3 or 12+4. The main disadvantage is that it's quite a bit more computationally expensive than the RAID 6 or even RAID 5 parities. – Arne Vogel Jul 16 '19 at 11:56
10

RAID 5 should never exist with a hot spare (warm spare.) RAID 6 is always a better use of the same drive count.

http://www.smbitjournal.com/2012/07/hot-spare-or-a-hot-mess/

There is no space/capacity advantage or cost advantage to the RAID 5 solution (but some small performance advantage) but it does a ton for mitigating things like URE risk.

  • 3
    RAID-6 is substantially slower then RAID-5 + a hot spare. RAID-6 is not always better. – Stefan Lasiewski Apr 28 '14 at 17:40
  • 2
    RAID-5 (or for that matter RAID-6) with a spare can make perfect sense from a drive count perspective if you have more than one stripe that can share a single spare. That allows resilvering to begin immediately when a drive fails, and replacing the failed drive ASAP but *when convenient*, instead of scrambling to replace the failed drive before there is another failure. – user Nov 21 '16 at 12:12
  • 3
    RAID 6 is faster on reads, slower on writes. But only when random. Sequential, it is faster on both. But the safety difference is epic, a full order of magnitude difference. Add in a write cache like any business class hardware RAID has and the random writes of RAID 6 drops dramatically. It's very rare that R6 is slower than R5 as it has the extra spindle. – Scott Alan Miller Mar 04 '17 at 12:29
  • Immediate resilving is actually the danger because it is the resilver operation that puts the array in the most danger. R6 with a hot spare is totally different than R5 with a hot spare. R5 with a hot spare should never exist because you use that same drive in R6 but gain the speed and reliability. No need to resilver, the protection is already in place without the URE risk of the resilver operation. – Scott Alan Miller Mar 04 '17 at 12:30
6

This one's easy - do you want more available disk space or the ability to survive disks failing - it's that simple.

So I'll make some wild assumptions - that you don't care about performance as they're 7.2's and that you do care about available space as they're 1TB disks - you don't mention what type of data you want to store but I'm going to assume it's either just video files or a combination of video and audio. If I'm right then presumably you would struggle to replace the data? in that case I'd choose neither 5 or 6 but go with R10, yes you lose 1TB over R6 and 2TB over R5 but it'll be faster and can survive three disks going pop. If I'm wrong and you can quickly recover your data then you may as well go R5 so you get the most available space.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • 1
    “yes you lose 1TB over R6 and 2TB over R5 but it'll be faster and can survive three disks going pop.” – That's if you're lucky; in the worst-case you'll not survive more than 1 drive failure (if the 2nd is in the same RAID 1 pair as the first). If I'm not mistaken, for 1, 2, 3, and 4 drive failures, the ‘naïve’ cumulative probability of having to resort to backup is 0, 1/5, 3/5, 1, respectively (the number of zero terms being the number of drive failures that can be safely tolerated). With RAID 6, it's 0, 0, 1, which is somewhat easier to work with, plus you don't lose that extra terabyte. – James Haigh Mar 06 '15 at 08:46
  • 2
    You're right James, things have changed in the last *FIVE YEARS* (OMG that was a long time ago - wow!) and now I only recommend R6/60 and R1/10. – Chopper3 Mar 06 '15 at 09:11
4

While performance is a concern, it is not as much of a concern as you think. Newer RAID hardware is going to be as fast writing 2 parity stripes as they are at 1 stripe.

Also, SAS drives usually have a lower Bit Error Rate, which translates to a lower Unrecoverable Read Error (URE). Usually an order of magnitude (10x), however, if the SAS drive is the same model as a SATA drive, you may not see an improvement.

Finally, as to the question of RAID-5 and UREs and how that can just ruin your day, I wrote an article on it some time ago at: http://subnetmask255x4.wordpress.com/2008/10/28/sata-unrecoverable-errors-and-how-that-impacts-raid/ in which I cover some of these questions. Basically, RAID-6 on good hardware (not software RAID) should show equivalent performance to RAID-5. If you have poor hardware, or are using software RAID, then you will see a performance hit.

As always, backups are your saving grace. Do not forget to implement a backup policy, AND verify that it can be recovered from.

  • 2
    Wow, I think I was all over the dang place with that one. What I was trying to say is that given your drives, if you have a good hardware RAID controller, then RAID 6 has no downsides over RAID 5 with a hot spare (the +1). However, if you do not have a good hardware RAID controller, then you will see slower reads and writes, with writes being a bit slower than the reads. If you are using software RAID only, then you will see a slower read, and much slower writes. Good hardware - RAID 6; Poor hardware - RAID 6 with performance hit; Software only - RAID 5 (+1) and pray a lot. –  Mar 02 '10 at 21:29
3

RAID6. That way, if a two disks fail in relatively quick succession, so the array hasn't finished rebuilding from the first failure (or 2 disks fail to spin up after a power outage), you've not lost all your data.

xenny
  • 780
  • 4
  • 8
3

Lots of good advice here, particularly from Bart and Chopper3.

The only thing that I would add is test your workload under a failure condition. People usually setup a RAID-5/RAID-6 to buy availability (ie. the server isn't going down due to disk failure). Unfortunately, for some workloads -- especially write-heavy workloads, you may find that the performance hit is severe enough that you aren't buying much.

If your testing works out, great -- just don't forget to devise and test a backup strategy.

duffbeer703
  • 20,077
  • 4
  • 30
  • 39
2

I think RAID 5 vs. 6 (and vs. 10) comes down to performance and how much you trust the brand(s) of drives you use. We primarily use HP servers and HP storage and have had very few disk failures, so I'm happy with RAID5+hotspare or even just RAID5 on less critical systems.

In theory, RAID6 saves you from a second drive failure while a first failed drive is being rebuilt, but you trade that off against the increased computations needed to generate 2 different parity stripes. We haven't used 6, so I've never looked at any specs on just how fast a RAID6 array can be rebuilt vs. RAID5, but I assume 6 will take longer since it has to do 2 parity calculations, not just one.

If I were going to move away from RAID5, I'd probably go with 10 (or 01) to get away from parity calculations altogether. With 6 drives, you could do that, although you net only half the capacity.

Ward - Reinstate Monica
  • 12,788
  • 28
  • 44
  • 59
1

Raid6 has more overhead, so raid5 as such will be faster on the same amount of drives. On the other hand you might lose the advantage once a disk dies, and the rai5 rebuilds, you are at risk of losing the array if another drives dies during the rebuild.

In your case, you're running raid5 vs 6 on a different amount of spindles, so because of the extra spindle raid6 might be faster.

dyasny
  • 18,482
  • 6
  • 48
  • 63
1

Good answers, except that no-one has mentioned write performance. RAID 6 usually gets you a slight performance hit on write compared to RAID 5, since there are two parity stripes to be maintained.

Nonetheless, most find it worthwhile to go with RAID 6.

kmarsh
  • 3,103
  • 15
  • 22