Does RAID 1 protect against corruption?

14

5

Does Raid 1 protect against data corruption? For example, let's say that I am keeping all of my important files on a NAS that uses 2 disks in a RAID 1. If one hard drive has some kind of internal problem and the data becomes corrupted, does the RAID recognize this automatically and correct it using data from the other good disk?

Could it even know which copy is the good one?

Does RAID 5 protect against corruption?

I know that RAID is not a backup solution. I am trying to figure out how to make sure that I am not backing up corrupt data!

Shaun

Posted 2010-02-24T02:48:34.647

Reputation:

Answers

13

RAID-1 protects against the complete failure of one of the two drives. If the drive is not marked as failed, then its contents are assumed to be accurate. But if, for whatever reason, one of the two drives was returning inconsistent data, then that error would not be detected by the RAID system, and the application would get bad data.

Many controllers have a verification process that runs periodically, but the purpose of this is to test for disk failure, not data integrity. Hard drives implement their own data integrity tests and checksums which they use to spot bad sectors, but the algorithm is designed to be fast and compact, not thorough, so errors can leak through.

While data corruption is the exception rather than the rule, it's also not unheard-of. A member of the ZFS team, for example, reported in an interview seeing corrupt data being dished to them by their high-end RAID-5 device which they spotted by virtue of the fact that ZFS implements checksums at that filesystem level.

tylerl

Posted 2010-02-24T02:48:34.647

Reputation: 2 064

5

It depends on where the corruption stems from. If a drive in a RAID 1 mirror is screwey and is writing nonsense then the RAID mirror will degrade and the good drive will be in use and you'll have the good files. In the case of RAID 5 this is done with 2 data drives and a parity drive (in simplest form) and if one of the 3 drives is failing to write proper files then it will fail out and you'll be left with either 2 data drives or 1 data drive and a parity drive.

Now lets look at what happens if the corruption is caused by a virus or a bug in a program. In RAID 1 and RAID 5 no drive will be taken out of service because the drives are writing properly. Nothing has failed. However files will be destroyed because the virus or bug is writing junk, and it will write it to both your drives in a RAID 1 mirror, and to all 3 of your drives in a RAID 5 system.

That is why RAID is not backup. It prevents the most likely failure which is a disk failure but it doesn't account for a lot of other scenarios.

Joshua Levitsky

Posted 2010-02-24T02:48:34.647

Reputation: 102

4+1 "This is why RAID is not a backup" God knows how many times I've heard "I'm ok, got my backup covered with a RAID" – Urda – 2010-02-24T03:13:11.663

2How can the RAID distinguish between which data is good and which is bad? – None – 2010-02-24T04:13:31.597

1Shaun... if your data is eaten by a virus or accidentally deleted, the RAID can never distinguish it as good or bad. All the RAID is in charge of is making sure that (in a RAID 1) that both disks are equal. If a sector fails a checksum, the RAID controller compensates to repair it, or triggers a rebuild. In a RAID 5, if a sector fails a parity check, a rebuild is triggered. RAID protects the physical drives from failing, and resulting from data loss. They cannot protect against data lost to program faults or viruses. – Urda – 2010-02-24T14:47:17.420

2I have to downvote this. RAID1 does not do checksumming, it only protects against a complete drive failure. If one drive starts returning garbage, it has no way to tell which one is right, and will happily return garbage data. RAID5 I'm not sure of, because of the parity checks. This is exactly why filesystems like ZFS and BTRFS were invented, so that you get a 'data-aware' RAID-like system, which can correct garbage data appropriately using checksums to verify blocks of data. – Alex – 2015-08-27T11:45:00.933

6Your characterization of RAID 5 is inaccurate. There is no separate parity drive, instead parity is distributed across all drives. You end up with a total available space of n-1, but there isn't a drive dedicated to parity. – MDMarra – 2010-06-04T03:38:57.630

1@Urda, there is no checksum to be checked on RAID 1, correct? In that case, if the RAID system detects that a block contains different data in each disk, wouldn't it have to guess which ones is right, possibly risking causing filesystem level corruption during the rebuild? – Renan – 2013-04-26T06:32:12.920

5

As others have noted, a raid1 system has no way to tell which of two sectors is bad.

Higher end raid systems run a scrub operation in the background to compare both copies, and flag differences. Better yet is a system that reads both blocks from the drive each time, and compares them at read time. Resolving those differences however is impossible for the raid controller.

On Unix systems under mdadm, a scrub check can be initiated with the "sync_action":

md arrays can be scrubbed by writing either check or repair to the file md/sync_action in the sysfs directory for the device.

Requesting a scrub will cause md to read every block on every device in the array, and check that the data is consistent. For RAID1 and RAID10, this means checking that the copies are identical. For RAID4, RAID5, RAID6 this means checking that the parity block is (or blocks are) correct.

raid1 is all about protecting from sudden total drive failure. Look elsewhere for protection against corruption. Beyond that Raid1 offers no "history", so can't recover from human or software error. Look to filesystems like ZFS or a history preserving filesystem like Hammer for protecting against corruption.

Bryce

Posted 2010-02-24T02:48:34.647

Reputation: 2 038

3

In practice, yes. The vast majority of hard drive failures occur all-or nothing. Either (a) the cable is unplugged or the drive microcontroller have failed, so the RAID controller gets no response at all -- obvious failed drive. Or (b) The cable and drive microcontroller are good, but when it tries to read a sector, the internal drive microcontroller detects data corruption because the internal ECC checksum failed, and repeated attempts to read that sector (in case it's a temporary read glitch) eventually time out, so the RAID controller gets a polite "sorry" response -- obvious failed drive. Either way, it is obvious to the RAID-1 or RAID-5 controller that the drive has failed.

In principle, no. If something has gone so badly wrong that a hard drive is writing nonsense, and yet somehow working well enough to write the correct internal ECC code for that nonsense, then RAID-1 can't tell which drive is correct. The RAID-1 system will likely overwrite the good data with the corrupt data on a resync. RAID-5 is no better. The "RAID-5 write hole" power failure during active writing is one particular rare but not impossible case.

As far as I know, the only way to avoid such corruption is to use end-to-end checksums in addition to file mirroring, either automatically as part of the file system (ZFS or Btrfs) or periodically or manually (recalculating rsync checksums, simple file verification, Parchive file sets, etc.); ideally with a cryptographic hash such as SHA-256.

David Cary

Posted 2010-02-24T02:48:34.647

Reputation: 773

– Mick – 2014-04-02T10:24:58.493