They are correct, such an arrangement would allow an any-two failure to be survivable.
The 'N=2' setup ensures that first two of the replicated blocks are on equivalent sectors of different disks.
The 'F=2' setup ensures that the second two of the replicated blocks do not share disks with the N=2 replicated blocks, so long as the array meets or exceeds N+F drives in size.
As for the capacity math, the article has it incorrect (see below). Let's take a look at a few examples:
A 4-drive 2x2 array
- There are a total of 4 drives in the array
- There are 2 'near' replicas
- There are 2 'far' replicas
- The 'stripe width' is therefore 4 blocks.
Each block is therefore replicated four times. Capacity is the size of a single drive.
A 5-drive 2x2 array
- There are a total of 5 drives in the array
- There are 2 'near' replicas
- There are 2 'far' replicas
- The 'stripe width' is therefore 4 blocks.
As with the 4 drive array, each block is replicated four times. However, the extra drive provides another full-drives worth of blocks to expand into. Capacity is the size of two drives.
Put it a different way:
Given:
- N = Number of Drives
- R = Number of Replicated blocks per stripe
- S = Size of Drives
Capacity = S * ( N - (R-1))
4-Drive Array:
N = 4
R = 4
Capacity = S ( 4 - (4-1)) = S * (4-3) = S * 1
5-Drive Array:
N = 5
R = 4
Capacity = S * ( 5 - (4-1)) = S * (5 - 3) = S * 2
The 'any two can fail' condition only exists if N ≧ R. In fact, with R=4, any three can fail. Again, only if N ≧ R.
I must point out a math error in the article itself. To quote:
The capacity of this particular four-drive “near” and “far” RAID configuration is the following.
Capacity = (n/2) * capacity of single disk
This is incorrect. The 2
in that formula is supposed to be the number of replicated blocks. Which in a 2x2 setup is 4. This is clearly demonstrated in the diagram, where the "A1" block shows up four times. The author gets it right for the 3-drive examples, as those formula show division by 3.
[Example is for a two-replica RAID on three disks]
Capacity = 2/3 * capacity of single disk
This is further supported by the man-page for md:
Finally, it is possible to have an array with both 'near' and 'far'
copies. If an array is configured with 2 near copies and 2 far copies,
then there will be a total of 4 copies of each block, each on a different
drive. This is an artifact of the implementation and is unlikely
to be of real value.
So a 2-by-2 RAID setup will have four copies. Therefore, a four drive implementation of a 2-by-2 RAID will have the capacity of a single drive.
The author's thesis that a near/far RAID setup will provide additional protection above and beyond normal R10 is weak. The protection does not come from the near/far setup, the protection come from there being more than 2 replications of the data.
RAID configs where data is replicated R times can tolerate up to R-1 disk failures. More failures can be tolerated so long as the extra failed devices are in already-failed replication-sets. This is why a mirror pair (R=2) of RAID0 devices can tolerate a single drive failure. If R equals the number of drives (N) you can have all but one fail and still maintain service.