How does RAID 1 determine whether a disk is corrupted?

4

2

I've built a RAID 1 array of 2 disks, A and B.

That means that every bit on A is equal to a bit on B. If one disk fails, I can safely retrieve my data from the other disk. But then I started wondering: How true is this?

Let's say a bit 1 on A reads 0, but 1 on B. How would the RAID controller be able to tell, which one is corrupted and which one is not? Is this based on what the so called "S.M.A.R.T." technology reports, and is that really worth anything, or would I be just as well of with a non-RAID solution?

I can see why this is not a problem on RAID 5, so I'm planning to upgrade.

Einar

Posted 2012-02-12T22:11:45.707

Reputation: 193

1

Possible duplicate of Does RAID 1 protect against corruption?

– Ƭᴇcʜιᴇ007 – 2016-01-13T16:33:36.897

I'm not posting this as an answer as I don't know if I'm 100% correct, but I believe that for the circumstance you are describing to happen the disks would have had to be written to independently with different data, which isn't what happens in a RAID 1 set up. It might occur in the case of a fault with the RAID controller but even then it seems unlikely. – chunkyb2002 – 2012-02-12T22:40:06.217

RAID IS NOT A BACKUP!! There are problems with RAID5 as well. http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

– Zoredache – 2012-02-12T22:50:41.727

Related: What exactly does a RAID 1 resync do?

– Ƭᴇcʜιᴇ007 – 2012-02-12T23:29:38.527

Answers

5

RAID 1 or RAID 5 would not protect against the sort of problem you are describing. They are mainly meant to protect against the hardware failure of a single drive (and, therefore, to reduce system downtime). With RAID 5, the parity information is not used until the failure of a drive is detected.

Although quite rare, bits can seemingly randomly change state due to a variety of causes - it's called bit rot. To protect against bit rot you can:

  1. Add further redundancy, e.g., by using RAID 6, combined with regular data integrity checks.
  2. Use a file system which actively checks for data integrity, such as ZFS. By using ZFS with RAID-Z1 (single-drive redundancy), when reading any bit that randomly "flipped", the error will be detected because the calculated checksum does not match the stored checksum. Then, where possible, ZFS will automatically correct the error using parity information.

It's worth pointing out that hard drives do have built-in data redundancy to partly mitigate bit rot.

sblair

Posted 2012-02-12T22:11:45.707

Reputation: 12 231

RAID5 should actually be able to detect the sort of problem described. So long as all the errors were on a single drive, a 'validation' check often offered by RAID controllers would pick it up and be able to reconstruct the original data accurately. – ChrisInEdmonton – 2012-02-25T02:37:32.167

3@Chris This is true, but it doesn't do this actively. Bit rot creeps in, and then one drive fails. Usually the rebuild would go smoothly after replacing the drive, but then you start to encounter failures during the rebuild because there were some unreadable bits. Since the array is in a degraded state, it cannot repair these errors. This is the reason things like ZFS actively scrub data using the parity checks, to resolve these errors as they occur. – AaronLS – 2012-06-28T22:36:47.237

Last time I used a hardware RAID5 card, the management software was automatically configured to run a validation check once per week. I see my current NAS box, running RAID5, does not perform such a validation, though. – ChrisInEdmonton – 2012-06-29T01:53:25.947

9

RAID1 is not a backup solution at all. What RAID1 does is to protect you from a single-drive failure. That's all. Well, okay, it also speeds up your read speeds a little. But it's not a backup solution. If you delete a file, it's deleted from both drives. If you format your RAID1, both drives are formatted. If your files are infected with a virus, you can't recover. That's why RAID1 is not a backup solution.

To answer your other question, if the data is mismatched on the drives, there's no way to tell which is correct. However, the odds of this are perhaps not as high as you may think. See, for example, Wikipedia's section on error handling on modern hard drives.

It's not impossible to add additional error-detection and error-correction, but that is not typically done at the level of the RAID controller. Some file systems such as ZFS add additional protection for your data integrity.

ChrisInEdmonton

Posted 2012-02-12T22:11:45.707

Reputation: 8 110