I know how most of the various RAID's work. But I stumbled on the recommended raid10,f2 mode while researching linux software raid. I don't really understand how it works on 2 or 3 disks. could someone explain it to me? or point me to a really good article that explains it?
4 Answers
Actually I think Wikipedia explains it better than the actual docs. Here's the text from the article.
The Linux kernel software RAID driver (called md, for "multiple device") can be used to build a classic RAID 1+0 array, but also (since version 2.6.9) as a single level with some interesting extensions. The standard "near" layout, where each chunk is repeated n times in a k-way stripe array, is equivalent to the standard RAID-10 arrangement, but it does not require that n divide k. For example an n2 layout on 2, 3 and 4 drives would look like:
2 drives 3 drives 4 drives
-------- ---------- --------------
A1 A1 A1 A1 A2 A1 A1 A2 A2
A2 A2 A2 A3 A3 A3 A3 A4 A4
A3 A3 A4 A4 A5 A5 A5 A6 A6
A4 A4 A5 A6 A6 A7 A7 A8 A8
.. .. .. .. .. .. .. .. ..
The 4-drive example is identical to a standard RAID-1+0 array, while the 3-drive example is a software implementation of RAID-1E. The 2-drive example is equivalent RAID 1. The driver also supports a "far" layout where all the drives are divided into f sections. All the chunks are repeated in each section but offset by one device. For example, f2 layouts on 2- and 3-drive arrays would look like:
2 drives 3 drives
-------- ------------
A1 A2 A1 A2 A3
A3 A4 A4 A5 A6
A5 A6 A7 A8 A9
.. .. .. .. ..
A2 A1 A3 A1 A2
A4 A3 A6 A4 A5
A6 A5 A9 A7 A8
.. .. .. .. ..
This is designed for striping performance of a mirrored array; sequential reads can be striped, as in RAID-0, random reads are somewhat faster (maybe 10-20 % due to using the faster outer sectors of the disks, and smaller average seek times), and sequential and random writes are about equal performance to other mirrored raids. The layout performs well for systems where reads are more frequent that writes, which is a very common situation on many systems. The first 1/f of each drive is a standard RAID-0 array. Thus you can get striping performance on a mirrored set of only 2 drives. The near and far options can both be used at the same time. The chunks in each section are offset by n device(s). For example n2 f2 layout stores 2×2 = 4 copies of each sector, so requires at least 4 drives:
4 drives 4 drives
-------------- -------------------
A1 A1 A2 A2 A1 A1 A2 A2 A3
A3 A3 A4 A4 A3 A4 A4 A5 A5
A5 A5 A6 A6 A6 A6 A7 A7 A8
A7 A7 A8 A8 A8 A9 A9 A10 A10
.. .. .. .. .. .. .. .. ..
A2 A2 A1 A1 A2 A3 A1 A1 A2
A4 A4 A3 A3 A5 A5 A3 A4 A4
A6 A6 A5 A5 A7 A8 A6 A6 A7
A8 A8 A7 A7 A10 A10 A8 A9 A9
.. .. .. .. .. .. .. .. ..
As of Linux 2.6.18 the driver also supports an offset layout where each stripe is repeated o times. For example, o2 layouts on 2- and 3-drive arrays are laid out as:
2 drives 3 drives
-------- ----------
A1 A2 A1 A2 A3
A2 A1 A3 A1 A2
A3 A4 A4 A5 A6
A4 A3 A6 A4 A5
A5 A6 A7 A8 A9
A6 A5 A9 A7 A8
.. .. .. .. ..
Note: k is the number of drives, n#, f# and o# are parameters in the mdadm --layout option. Linux can also create other standard RAID configurations using the md driver (0, 1, 4, 5, 6).
- 323
- 2
- 14
- 1,476
- 2
- 12
- 26
From what I read an f2 RAID10 array keeps at least 2 copies of each block and they stored far away from each other.
Here are the relevant sections from the man pages.
-p, --layout= This option configures the fine details of data layout for raid5, and raid10 arrays
...
Finally, the layout options for RAID10 are one of 'n', 'o' or 'p' followed by a small number. The default is 'n2'.n signals 'near' copies. Multiple copies of one data block are at similar offsets in different devices.
o signals 'offset' copies. Rather than the chunks being duplicated within a stripe, whole stripes are duplicated but are rotated by one device so duplicate blocks are on different devices. Thus subsequent copies of a block are in the next drive, and are one chunk further down.
f signals 'far' copies (multiple copies have very different offsets). See md(4) for more detail about 'near' and 'far'.
RAID10 provides a combination of RAID1 and RAID0, and sometimes known as RAID1+0. Every datablock is duplicated some number of times, and the resulting collection of datablocks are distributed over multiple drives. When configuring a RAID10 array it is necessary to specify the number of replicas of each data block that are required (this will normally be 2) and whether the replicas should be 'near', 'offset' or 'far'. (Note that the 'offset' layout is only available from 2.6.18).
When 'near' replicas are chosen, the multiple copies of a given chunk are laid out consecutively across the stripes of the array, so the two copies of a datablock will likely be at the same offset on two adjacent devices.
When 'far' replicas are chosen, the multiple copies of a given chunk are laid out quite distant from each other. The first copy of all data blocks will be striped across the early part of all drives in RAID0 fashion, and then the next copy of all blocks will be striped across a later section of all drives, always ensuring that all copies of any given block are on different drives.
The 'far' arrangement can give sequential read performance equal to that of a RAID0 array, but at the cost of degraded write performance.
When 'offset' replicas are chosen, the multiple copies of a given chunk are laid out on consecutive drives and at consecutive offsets. Effectively each stripe is duplicated and the copies are offset by one device. This should give similar read characteristics to 'far' if a suitably large chunk size is used, but without as much seeking for writes.
It should be noted that the number of devices in a RAID10 array need not be a multiple of the number of replica of each data block, those there must be at least as many devices as replicas.
If, for example, an array is created with 5 devices and 2 replicas, then space equivalent to 2.5 of the devices will be available, and every block will be stored on two different devices.
- 128,755
- 40
- 271
- 413
-
sounds like offset is the way to go... – xenoterracide May 06 '10 at 11:45
-
1I think Wikipedia's more visual explanation is clearer... so I've posted it here. – xenoterracide May 11 '10 at 04:39
That's interesting and well explained. However, plain RAID1 also has the feature, at least on Linux software RAID, to be able to sustain multiple readers in parallel at very good performance:
Data is read from any one device. The driver attempts to distribute read requests across all devices to maximise performance.
[ ... ] In theory, having an N-disk RAID1 will allow N sequential threads to read from all disks. (man 4 md, RAID1 section)
It looks RAID10, in its near layout, is more suitable to this behaviour (accelerating not single-threaded I/O like RAID0 but multi-threaded I/O). n2f2 with 4 disks being similar to RAID1 with 4 disks.
The n2 layout with 4 disks will do both: double the read performance for a single thread, and quadruple the read performance for two threads (if the Linux md RAID10 scheduler is well implemented, one thread should read on a pair, and the other on the other pair).
All depends what you need! I didn't do benchmarks yet.
- 21
- 1
First of all mdadm R10 is a special mode it is not R0(R1,R1,R1..) f2 is 2 far copies for redundancy.
Both answers are good I want to make an addition with some benchmark results. Which I could not fit them in comments section...
I have tested with intel X79 C200 series chipset sata controller (2x6Gbps 4x3Gbps) 64GB ram Xeon 2680.
Using fio benchmark line:
sudo fio --refill_buffers --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=1024K --iodepth=32 --rw=read --size=5G --filename=testfile --timeout=60
replace read with write and you'll have a write test ...
Results for 6x 1TB Seagate Barraccuda chunk=512k (MB/sec) :
SingleDisk 189/183
R0 1132/1078
R10 n2 c=32k 911/350 (w parts)
R10 n2 940/311 (w parts)
R10 n2 981/262 (w parts, bs=10M)
R10 f2 1032/264 (w parts)
R0(R1+R1+R1) 578/385 (w parts)
R1(R0+R0) 550/300 (w/o parts) !!! RETEST needed
R0(R5+R5) 686/236 (w/o parts) !!! RETEST needed
8 Disk Western Digital Gold 18TB Datacenter Disks chunk=512k (MB/sec):
R0 c=512k 1334/1313
R10 f2 c=512k 1316/283
Note:
(w parts) --> converted and joined ext4 partitions used sdX1
(w/o parts) --> used raw disk as sdX
- 452
- 3
- 7