12

I am about to replace an old hardware RAID5 array with a Linux software RAID1 array. I was talking to a friend and he claimed that RAID5 was more robust than RAID1.

His claim was that with RAID5, on read the parity data was read to make sure that all the drives were returning the correct data. He further claimed that on RAID1 errors occurring on a drive will go unnoticed because no such checking is done with RAID1.

I can see how this could be true, but can also see that it all depends on how the RAID systems in question are implemented. Surely a RAID5 system doesn't have to read and check the parity data on a read and a RAID1 system could just as easily read from all drives on read to check they were all holding the same data and therefore achieve the same level of robustness (with a corresponding loss of performance).

So the question is, what do RAID5/RAID1 systems in the real world actually do ? Do RAID5 systems check the parity data on reads ? Are there RAID1 systems that read from all drives and compare the data on read ?

andynormancx
  • 303
  • 2
  • 10

11 Answers11

22

RAID-5 is a fault-tolerance solution, not a data-integrity solution.

Remember that RAID stands for Redundant Array of Inexpensive Disks. Disks are the atomic unit of redundancy -- RAID doesn't really care about data. You buy solutions that employ filesystems like WAFL or ZFS to address data redundancy and integrity.

The RAID controller (hardware or software) does not verify the parity of blocks at read time. This is a major risk of running RAID-5 -- if you encounter a partial media failure on a drive (a situation where a bad block isn't marked "bad"), you are now in a situation where your data have been silently corrupted.

Sun's RAID-Z/ZFS actually provides end-to-end data integrity, and I suspect other filesystems and RAID systems will provide this feature in the future as the number of cores available on CPUs continues to increase.

If you're using RAID-5, you're being cheap, in my opinion. RAID 1 performs better, offers greater protection, and doesn't impact production when a drive fails -- for a marginal cost difference.

user9517
  • 114,104
  • 20
  • 206
  • 289
duffbeer703
  • 20,077
  • 4
  • 30
  • 39
6

I believe that the answer depends on the controller/software for example it is quite common for mirroring systems to only read one disc out of a pair and therefore be capable of delivering the wrong data. I note that if your results depend on that data the when the data is written to both discs it is then corrupted on both discs.....

From the pdf under SATAssure(tm) Plus:

"Revolutionary SATAssure technology delivers enterprise-class data protection and reliability using large capacity, inexpensive SATA disk drives. SATAssure operates on all read operations, ensuring data integrity and automatically corrects problems in real-time – all without the performance or capacity penalty found in traditional storage systems. Reduce drive RMAs with a new ability to power-cycle individual drives. "

It is interesting that some manufactures make a fuss about the fact they they always compute parity, this leads me to think that it is relatively uncommon on hardware controllers. It is also of note that systems such as ZFS and WAFL (netapp) do parity calculations for every read.

James
  • 2,212
  • 1
  • 13
  • 19
  • That link looks interesting, but does it actually explicitly say anywhere on that page or brochure that they recompute the parity on all reads ? – andynormancx Jul 29 '09 at 12:09
  • I added a quote from the pdf. Note that the S2A boxes are quiet high end. – James Jul 29 '09 at 12:36
3

With RAID-5, parity is generally only read on array rebuild, not on general read. This is so reads can be more random and faster (since you don't have to read and calculate parity for an entire stripe every time you want 1K of data from the array).

With RAID-1, generally reads are stepped across drives whenever possible to give increased read perfomance. As you noted, if the RAID subsystem tries to read both drives and they differ, the subsystem has no way of knowing which drive was wrong.

Most RAID subsystems depend on the drive to inform the controller or computer when it is going bad.

So is RAID-5 "more robust"? The answer is, it depends. RAID-5 lets you get more effective storage for a given number of disks than RAID-1 does; although to give effective storage beyond one disk, RAID-1 needs to be combined with RAID-0, either as a stripe of RAID-1 arrays, or a RAID-1 across two RAID-0 stripes.

(I prefer the former, since a single drive failure will take out a single RAID-1 element, meaning that only a single drive will require rebuilding. WIth the latter, a single drive failure kills a RAID-0 element, meaning that HALF the disks will be involved in the rebuild when the drive gets replaced.)

This also leads to discussions of "phantom writes", where a write is reported as successful by the drive electronics, but for whatever reason the write never makes it to the disk. This does happen. Consider that for a RAID-5 array, when you have a drive failure the array MUST read ALL sectors on ALL surviving drives PERFECTLY in order to recover. NetApp claims that the large size of drives plus the large size of raid groups means that in some cases your chances of failing during a rebuild can be as bad as one in ten. Thus, they are recommending that large disks in large RAID groups use dual-parity (which I think is related to RAID-6).

I learned this at a NetApp technical discussion given by a couple of their engineers.

David Mackintosh
  • 14,223
  • 6
  • 46
  • 77
  • I wouldn't use the term "more effective"... "more capacity" would be more appropriate. In my mind, a solution which makes it more likely that I will lose my data is not more effective. – duffbeer703 Jul 29 '09 at 15:47
  • Everything is a cost-value tradeoff. RAID-5 is more storage-cost-effective, while RAID-1 or RAID-1+0(0+1) is more robustness-effective. – David Mackintosh Jul 29 '09 at 15:50
  • The other reason to prefer 1+0 over 0+1 is that 1+0 can survive 4 of the 6 possible "a second drive fails before the first failed drive is replaced and the array rebuilt" scenarios where 0+1 can survive only 2 of the 6. Though 0+1 can survive a controller failure on one arm (where 0+1 can't) this is a lot more rare than drive failure (even multiple drive failure). – David Spillett Jul 29 '09 at 21:21
  • RAID-DP (NetApp's dual parity implementation) is a RAID-6. RAID-6 (unlike RAID-5) is defined functionally as a RAID which can survive two disk failures. RAID-DP differs from typical RAID-6 in that it doesn't distribute the parity -- WAFL doesn't randomly write random writes, so distributing parity doesn't provide any benefit. – Captain Segfault Aug 04 '09 at 21:20
3

No common RAID implementation typically checks the parity on data access. I've never seen one. Some RAID5 implementations read parity data for streaming reads to prevent unnecessary seeking (cheaper to throw away every nth block than to cause the drive to seek over every nth block). RAID1 implementations can't check because they read from both disks for performance (well, in the vast majority of RAID1 implementations. A handful let you pick, which can be useful if one disk is much slower than the other and it's not write-intensive load.)

Some do check with a background 'scrubbing'. In that case, RAID6 wins as it can recover the data, and RAID5 and RAID1 are in the same situation, you can identify but not fix. (This is not strictly true as the drive could detect a bad CRC, return an error, and let you rewrite the block from parity. This happens quite commonly).

If you want data integrity, store a hash with every block (or record, or however it's divided up) at the application layer. Sybase and Oracle do this (I believe at the page level) and I've seen it on many occasions save a gigantic database. (e.g. controller starts returning bad data, sybase crashes with a clear error, therefore no writes were done when the database was running on failing hardware with an inconsistent state).

The only filesystem solution and the only RAID solution that does this for you is ZFS.

carlito
  • 2,489
  • 18
  • 12
0

Is your friend talking about the parity bit that is involved in some RAID levels, or the checksum of the data written to disk?

If they're on about parity, then RAID1 does not have a parity bit - You have two copies of the same data. There should be a checksum performed by the disk to ensure what was written to disk matches what came down the wire

RAID5 does have a parity bit. This means that you can lose a disk in your RAID set, and continue as if nothing happened. Still, there should be a checksum performed of the data written to disk to ensure it matches what came down the wire

In this instance, checksums are totally independent of RAID that may or may not be performed with a bunch of disks

Edited to add: You mentioned moving from hardware RAID to software RAID. The preference is always hardware RAID over software RAID. If can you purchase the hardware required to give the RAID level you want to implement, I'd suggest you go for that. This will enable all the parity calculations to be performed by the RAID card, rather than the host. Therefore freeing up resource on the host. There are no doubt other benefits, but they escape me at the moment

Ben Quick
  • 1,215
  • 1
  • 8
  • 8
  • He was talking about parity. He was claiming that on RAID5 the parity information was retrieved on a read and compared to the data coming from the other disks to check that there were no read errors. – andynormancx Jul 29 '09 at 10:35
  • 5
    I disagree with your recommendation of hardware RAID over software in all cases. With modern hardware, software RAID can be just as fast as hardware if your server has plenty of spare CPU (which mine will always have). Also, hardware RAID has some downsides, the main one being in a recovery situation you need a matching RAID card. With software RAID you can pull a drive out of a machine, stick it in another one and away you go without getting a new RAID card that exactly matches the old one. – andynormancx Jul 29 '09 at 10:42
  • I was trying to get to the bottom what what context he was talking about As I understand it, checksums are used upon disk write (regardless of RAID). Parity is used for putting chunks of data on different disks, and to rebuild an array in the event of a disk failure. Parity isn't used on all RAID levels – Ben Quick Jul 29 '09 at 10:44
  • 1
    Please see http://en.wikipedia.org/wiki/RAID#Operating_system_based_.28.22software_RAID.22.29 for discussion about software Vs hardware RAID Obviously, the implementation you chose depends upon your environment and your requirements. My preference is hardware RAID over software RAID – Ben Quick Jul 29 '09 at 10:46
  • I understand, forget I ever mentioned checksums. I have updated the question to clear up the confusion. The question is all about whether RAID5 systems typically check the parity data on reads. – andynormancx Jul 29 '09 at 10:47
  • I'm well aware of the pro and cons of hardware/software RAID and I'm happy with my choice of software RAID for this implementation. Thanks anyway. – andynormancx Jul 29 '09 at 10:49
0

I am about to replace an old hardware RAID5 array with a Linux software RAID1 array. I was talking to a friend and he claimed that RAID5 was more robust than RAID1.

That would depend on the raid implementation type (hw/sw), the disks, the raid controller if any, and it's features.

His claim was that with RAID5, on read the parity data was read to make sure that all the drives were returning the correct data. He further claimed that on RAID1 errors occurring on a drive will go unnoticed because no such checking is done with RAID1.

it does make some slight sense, but not really :) what happens is - if wrong data is written, on a mirror it will be sent to both drives, and on raid5 parity for it will be generated and spread across the drives. data read/write checking is done by the disk and controller firmware, and has nothing to do with raid levels.

So the question is, what do RAID5/RAID1 systems in the real world

actually do ? Do RAID5 systems check the parity data on reads ? Are there RAID1 systems that read from all drives and compare the data on read ?

as I said, the checks aren't part of the raid algorithm, although some controllers might have something additional implemented.

the robustness of the array is up to the quality of the drives (2.5" drives tend to live longer than 3.5" due to decreased RV rates; in my experience NEVER buy maxtor SCSI/SAS drives - they have horrible firmware glitches), the environment (temperature and humidity control), the controller itself (does it have a BBU? is the firmware up to date? is it real raid or fakeraid?), the amount of PSUs in the server, the UPS quality etc.

dyasny
  • 18,482
  • 6
  • 48
  • 63
  • I'm afraid you haven't answered the question, which is very precisely about what actual real implementations of RAID5 do with respect to checking the parity data on a read. – andynormancx Jul 29 '09 at 11:13
0

I don't know this, but it seems to me unlikely that it does. Remember that in order to calculate the parity, it will have to read the block from all drives in your RAID set and then do math to determine correctness, whereas if it doesn't, it just just does the read off of one drive.

Also, if your read is for less than one block, a parity-check read would have to expand it to a full block, whereas a regular read wouldn't. (Assuming, of course, that the RAID block is bigger than the disks' blocks. I think that reads from disk have to be of full blocks. If not, my point is even more valid.)

So, from my point of view, yes, it could do that, but if it did, it would be inefficient, and I doubt that any are implemented that way.

Again, though, I have no personal knowledge of actual implementations.

wfaulk
  • 6,828
  • 7
  • 45
  • 75
0

Do RAID5 systems check the parity data on reads ?

It doesn't really make sense to. What do you do when you find a parity mismatch? (How do you know which block is wrong?)

For random reads checking parity would be expensive. Normally you could service a random read by just looking at a single disk, but if you want to check parity you'd need to read all disks on each read. (That might still make sense if there were anything you could do about it!)

Note that RAID-1 has this problem too -- which makes sense when you look at a RAID-1 as a two disk RAID-5.

0

I've been thinking a bit about the claim, that RAID-1 should be faster on reads than RAID-5, since it reads across both drives at once.

Now, since parity is not read on RAID-5 unless the array needs a rebuild, it actually equals a RAID-0 array in terms of reading, am I correct?

RAID-0 is generally regarded as being the fastest level (although it should be named "AID", since there's no redundancy whatsoever). :-D

Speaking of Linux software RAID, a simple test - using hdparm - confirms this theory: my RAID-5 arrays always shows a higher read speed than my RAID-1 arrays.

BUT: A degraded array performs much slower than a normal running array, it seems! I've just tested this with Fedora 9, running on 4 WD 1 TB drives with different RAID levels. Here's the results:

Degraded RAID-5: read speed 43 MB/sec Normal RAID-5: read speed 240 MB/sec (!) RAID-1: read speed 88 MB/sec

Since the allowed loss of disks is the same in RAID-1 and RAID-5 (namely one), I think RAID-5 should outperform RAID-1 in every aspect - giving more capacity in relation to number of disks used in array and same fault tolerance. This leads to a conclusion which states, that RAID-6 outperforms every other RAID level, since it's as fast as RAID-0 on normal read (no parity read from the two parity disks), and still fault tolerant in case of loss of an array member. ;-)

  • Some interesting stuff, but you've repeated the RAID1 fallacy that I hear all the time. RAID1 does _not_ have to mean that it can only survive a single disk failure. You don't have to have just two disks in your RAID1 array. For example you have a RAID1 array with 3 disks it will survive two disk failures and also read performance should increase (_if_ your RAID system is reading from multiple drives on a read). – andynormancx Sep 05 '09 at 10:19
-1

Personally, I think that the ultimate test of a RAID system is how well it can withstand failure. IN this case, both RAID5 and RAID1 can handle single drive failures but neither will survive any more than that.

As for your question on the parity bit, I would think that it is dependent on the RAID drivers. It will definitely be read during reconstruction but on normal use, it would not make much sense to do so as bandwidth would be wasted on it.

sybreon
  • 7,357
  • 1
  • 19
  • 19
  • I'm afraid your answer amounts to "I don't know whether any RAID5 implementations check the parity on a read" and so doesn't answer the question. – andynormancx Jul 29 '09 at 12:06
  • It is a valid answer because *nobody* will know for certain except the person who actually wrote the driver. – sybreon Jul 29 '09 at 13:45
-2

afaik, i am no 24/7 storage pro, the controller always checks what is written to and read from the disks. i.e. with raid1 you have slightly worse writes than on a single disk but your reads are a little bit faster (has to write a file to two disks but can read one part from disk one and the other part from disk two).

Maybe you can disable data checking for a raid level but what is the point of this, all raid levels (except 0) are there to give you data redundancy so why hamper yourself.

With raid 5 you need at least 3 disks and can use N-1 disks for data. With raid 1 you always need an even number of disks and can use N/2 disks for data.

So in bigger raids level 5 gives you more storage while raid 1 gives you more redundancy.

If by more robust you mean which offers more redundancy, than it is raid 1.

dependent on the raid size, you have also to consider rebuild times in case of an error (how many disks are there, how big is one disk, what kind of raid (soft-, fake-, hardware-), which level etc.)

So it is not really possible to say that one raid is more robust than another (maybe raid 6 is always more robust than raid 5 at the cost that you lose storage space)

mrt181
  • 171
  • 1
  • 10
  • 1
    I'm afraid this doesn't really answer my question, my question is very precisely about whether or not typical RAID5 implementations check the parity data on read. And for the record RAID1 does not need an even number of disks. You can quite happily have a RAID1 array with more than two disks in, thus increasing redundancy while reducing write speed. – andynormancx Jul 29 '09 at 11:00
  • 1
    "the controller always checks what is written to and read from the disks." This is not the case the controller could read from both discs but some controllers return the first data they get. – James Jul 29 '09 at 11:07
  • I think it's possible to say that RAID1 needs an even number of disks, and that a third mirror is something other than RAID1. So few implementations support a third mirror that the terminology has never standardized. – carlito Aug 30 '09 at 20:41