2

When setting up a server, I was told it needs to have really good fault tolerance. What RAID array would give me the best fault tolerance?

  • Please avoid asking subjective questions https://serverfault.com/help/dont-ask such as you have done. Please refer to the possible duplicate that HBrujin pointed out, though I noticed there was no mention of btrfs, or reword your question to follow this guide https://serverfault.com/help/how-to-ask . Though someone should mention something like a virtual SAN as they can survive the complete death of whole servers if setup right. – BeowulfNode42 Apr 15 '19 at 09:29
  • There's no "one weird trick to protecting your data". You need to define risk you're prepared to take vs. budget you can spend to mitigate the risk. For a start, if I was tasked to make a *server* fault tolerant, I'd reject that and ask about the requirements for the *service* and design fault tolerance from *that* point forwards, which might give me a completely different design around HA/clustering or containers or whatever, to meet the requirements vs. simply making the best you can of an individual server. – Rob Moir Apr 15 '19 at 12:56

2 Answers2

3

In classical hardware RAID world, RAID1 and RAID6 are the more reliable RAID levels.

In the more advanced software RAID world (MDRAID and ZFS), you can use 3-way mirroring or even triple parity scheme (ZFS only).

From a reliability standpoint, correctly configured ZFS pools probably are the state of the art.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
1

RAID-6 is, in my opinion, the best (at least, the best affordable), if fault tolerance is the most desired property, raid-5 being second best.

It would seem like RAID-1 (or 10 if more speed is desired and money isn't an issue) might be the go-to solution, but I wouldn't recommend that.

RAID levels 1 and 5 have in common that exactly one disk may fail, and no bad things will happen. Also, apart from complete disk failure, the array is resilient to single sectors becoming unreadable (as long as N-1 sectors with the corresponding number remain). With raid-1, in theory even as many as 50% of the array could fail and no bad things would happen, as long as it's strictly the "correct disks" that are failing. That is, never any two disks with the same index.

In principle, you could also do raid-1 with two, three, or ten mirror copies if you like, but monetary constraints usually forbid that. After all, throwing two dozen extra disks at the problem doesn't precisely make the approach "inexpensive" (though the word "inexpensive" in RAID refers to the single disks within the array, but nobody would want to use a cheap disk in a RAID anyway).

RAID-10 is somewhat inferior insofar as it is basically RAID-0 stacked on top of two (or maybe three) instances of RAID-1. Although each of them is fault-tolerant within its limits, if any single of these fail, the whole thing fails.

RAID-5 is cheap (only one extra disk needed) and has actually been sufficient for most people because hey, when do two disks die at the same time? Never happens! Well, sadly it can happen, and it does happen. Also, it can happen that a sector becomes unreadable. Yeah, that never happens, it's sooooooo unlikely, right.

Unluckily, when you need to re-sync after a failure, you must read every sector on all remaining disks. With modern disk capacities, that is a huge number. Huge number multiplied with unlikely-probably-never-happens will, unluckily result in a probability that is not at all impossible. It can happen that after one disk has failed, a sector goes bad. It can happen that a second disk (which has the same number of power-on hours) fails, especially when put under a 16-18-hour stress test during resync.

RAID-6 is the same as RAID-5 except it can withstand two disks failing simultaneously. It does not matter which disks fail, there is no worst case. Any two disks go down, and you're still good to go.
So when you have the first disk failing, it's not yet time for cold sweat. You are still good to go, and you still have redundancy in place. That is sooooooo much better compared to RAID-5, and it comes at the price of yet only one extra disk.

Damon
  • 231
  • 1
  • 2
  • If performance is not important, RAID-6 can be considered, running RAID-6 on HDD for random write operations is a bad choise. RAID-10 is not so cost-efficient, but gives a best redundancy and performance. – batistuta09 Apr 15 '19 at 12:22