How does parity work on a RAID-5 array?

I'm looking to build a nice little RAID array for dedicated backups. I'd like to have about 2-4TB of space available, as I have this nasty little habit of digitizing everything. Thus, I need a lot of storage and a lot of redundancy in case of drive failure. I'll also essentially be backing up 2-3 computers' /home folders using one of the "Time Machine" clones for Linux. This array will be accessible over my local network via SSH.

I'm having difficulties understanding how RAID-5 achieves parity and how many drives are actually required. One would assume that it needs 5 drives, but I could be wrong. Most of the diagrams I've seen have only yet confused me. It seems that this is how RAID-5 works, please correct me as I'm sure I'm not grasping it properly:

/---STORAGE---\    /---PARITY----\
|   DRIVE_1   |    |   DRIVE_4   |
|   DRIVE_2   |----|     ...     |
|   DRIVE_3   |    |             |
\-------------/    \-------------/

It seems that drives 1-3 appear and work as a single, massive drive (capacity * number_of_drives) and the parity drive(s) back up those drives. What seems strange to me is that I usually see 3+ storage drives in a diagram to only 1 or 2 parity drives. Say we're running 4 1TB drives in a RAID-5 array, 3 running storage and 1 running parity, we have 3TB of actual storage, but only have 1TB of parity!?

I know I'm missing something here, can someone help me out? Also, for my use case, what would be better, RAID-5 or RAID-6? Fault tolerance is the highest priority for me at this point, since it's going to be running over a network for home use only, speed isn't hugely critical.

Naftuli Kay

Posted 2011-05-23T21:59:44.453

Reputation: 8 389

Answers

It just XORs each corresponding bit from each drive - If you lose any drive, you can re-build the missing data.

For background:

A B (A XOR B)
0 0    0
1 1    0
0 1    1
1 0    1

Assume that D is the XOR of the other columns, then as long as you only lose one drive, you can figure out what you lost.

Some times the stripe bit will be distributed across the drives, but the concept is the same.

So for RAID-5, no matter how many drives, you only need 1 drive for parity equal or bigger than the smallest drive in the array you want to RAID.

RAID-5 for personal use is probably best as computational complexity is much lower than RAID-6.

RAID-6 is more complicated using Galois Fields to compute parity. And that can tax parity computations. However, you can lose more drives, but if you rebuild your array as soon as you get a single failure, you should be fine sticking with RAID-5.

Matt

Posted 2011-05-23T21:59:44.453

Reputation: 5 109

It was easy to "feel" that you already had (drives - 1)/drives of your information even without the parity on a single drive failure, but the explanation here makes the reason obvious. If you have n-1 drives' worth of bits from your XOR equation, comparing an XORing of the n-1 to your parity bit will always tell you if the "lost" bit is switched on or not. Nicely done. (Understanding RAID 6, heaven help me.) – ruffin – 2014-10-03T19:19:12.193

1If the parity is just an XOR of the two other disks, how do you know which of the two disks was corrupted? Wouldn't a bit flip on either disk result in a bit flip in the parity? – Jay Sullivan – 2015-01-11T21:19:47.157

Hi Little confused about situations like line 4 - (1,1,0 = 0) If you have (1,1,?) = 0, ? could be 1 or 0 and the XOR would still be correct. What am I missing? – MarkD – 2019-10-03T12:22:32.737

@MarkD Don't think of it as XOR, think of it as "even or odd number of 1s". (1,1,0 = 0), (1,1,1 = 1). – The Guy with The Hat – 2019-12-11T19:18:45.130

If you have (1,1,?) = 0, ? could be 1 or 0 and the XOR would still be correct. What am I missing? If you have a XOR b XOR c, you first compute a XOR b, and then compute the result XOR c. Think of it like [ ( a XOR b ) XOR c ]. – Vinny – 2020-02-27T00:33:48.987

So for a=1,b=1,c=0, you have [ ( 1 XOR 1 ) XOR 0 ] = [ ( 0 ) XOR 0 ] = 0
But for a=1,b=1,c=1, you have [ ( 1 XOR 1 ) XOR 1 ] = [ ( 0 ) XOR 1 ] = 1 – Vinny – 2020-02-27T00:39:55.840

Excellent answer. I was thinking on too large a scale, on an actual complete hard-disk basis, rather than a bit-level. So does RAID-5 use a dedicated drive for parity, or rather all drives for parity? I'm confused on that. – Naftuli Kay – 2011-05-23T22:44:40.220

2I believe the modern approach is to distribute the parity diagonally across all the drives. This has the effect of accelerating the read time to parity bits since multiple IO requests can be sent in parallel to different drives, but don't quote me on that. – Matt – 2011-05-23T22:55:02.920

Is there a mathematical formula I can use to determine the capacity given x drives and y GB available on each drive? – Naftuli Kay – 2011-05-23T22:59:01.487

2Yeah, it's the (smallest drive size) * (number of drives in array - 1) – Matt – 2011-05-23T23:01:20.650

Here's what I think is a better diagram to show how parity works in RAID4 and RAID5

RAID4

Disk1  Disk2  Disk3  Disk4
----------------------------
data1  data1  data1  parity1
data2  data2  data2  parity2
data3  data3  data3  parity3
data4  data4  data4  parity4

RAID5

Disk1   Disk2   Disk3   Disk4
----------------------------
parity1 data1   data1   data1   
data2   parity2 data2   data2  
data3   data3   parity3 data3
data4   data4   data4   parity4

camster342

Posted 2011-05-23T21:59:44.453

Reputation: 1 691

Or, have a look at this SVG on Wikipedia https://en.wikipedia.org/wiki/Standard_RAID_levels#/media/File:RAID_5.svg

– Giuseppe Crinò – 2019-07-02T10:19:35.877

I would recommend reading this Wikipedia article on Raid 5 and Raid 6

http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_parity_handling

RAID 5 writes a parity block in each strip, so for Strip A of a 4 disk array it writes the parity check on the 4th disk, with Data on disks 1, 2 and 3

For Strip B, the parity block is on disk 3, with data on disks 1,2 and 4.. etc..

If say disk 4 fails, the data can be recovered for Strip B as you know the data on disk 1 and 2 and have the parity check on disk 3.

If strip B had a parity of "2" and disk 1 has data of "1" and disk 2 data "0" then disk 4 must have had data equal to "1" so the disk is written with data = "1"

Whole disk can be recreated this way, RAID 6 extends this by having 2 party blocks per stripe.

Regarding space for Raid 5 you only ever loose one disks worth of space to parity, as it only writes on parity block per stripe, while with Raid 6 you will loose 2 disks but can also loose two disks rather than the one you can loose in Raid 5 ;)

The Wikipedia article explains this better!

markfknight

Posted 2011-05-23T21:59:44.453

Reputation: 356

RAID 5 uses one drive for parity, regardless of how many data drives there are in the array. This means that it becomes more efficient, in terms of usable space, the more drives that are added.

Parity is achieved by doing an XOR operation across the same block in each drive; the contents of the parity drive is adjusted such that all drives XOR to zero. This does mean that RAID 5 is restricted by the smallest capacity of all drives in the array.

RAID 6 is similar except that two simultaneous drive failures can be tolerated. This is useful because the process of "resilvering" an array after a single drive failure may be stressful enough to cause a second drive to fail.

sblair

Posted 2011-05-23T21:59:44.453

Reputation: 12 231

So that essentially means that I can have 4 2TB drives and have 6TB of effective, redundant storage? – Naftuli Kay – 2011-05-23T22:37:46.050

@TK Kocheran With RAID 5, yes. Note that the effective storage will be a bit less due to the file system. For example, my NAS with 4 2TB drives in RAID-Z1 (ZFS's version of RAID 5) has a usable space of 5.18TB. – sblair – 2011-05-23T23:12:16.217

Well yes, of course :) Always happens that way. Next question is what filesystem to use... – Naftuli Kay – 2011-05-23T23:29:50.513

If fault tolerance is your goal, RAID-6 will provide enough redundany to loose two drives. Typically RAID-5 will only tolerate a single drive failure.

Nate

Posted 2011-05-23T21:59:44.453

Reputation: 729

What's the ratio of drives to parity (total storage) for RAID-6? drive_size * (drive_count - 2)? – Naftuli Kay – 2011-05-23T23:19:18.437

1As well as fault tolerance for a second drive going bad before you can replace the first, there is one other situation that it is great for and I've come across more than once: A drive goes bad in a RAID array, and so a new drive is ordered. Some random guy who knows nothing about RAID arrays goes into the server room with new drive in hand, messes up the numbering, and ejects the wrong drive out of the array for replacement. Under RAID5, your array is screwed right there. RAID6 means you can still recover. – camster342 – 2011-05-25T05:51:38.963