9

RAID 1 and RAID 5 (and their brothers 10 and 50) achieve data redundancy respectively through mirroring and through parity checking. This allows a RAID array to still access data when a sector on a disk (or a whole disk) becomes unreadable. RAID 6 (or 60) uses an additional check to allow for double faults.

But how can a RAID array deal with data which is not altogether unreadable, but just plainly inconsistent?

If some error occurs such that f.e. data on a stripe is changed on a disk but the change is not propagated to the other one(s), the whole stripe would become inconsistent. If in a mirrored set a disk says "this bit is 0" while the other disk says "this bit is 1", how can a RAID controller know which one is right? The same reasoning could be applied to a RAID-5 stripe, with the added complexity that you can't easily know which sector is actually wrong in the stripe. Also, does RAID 6 mitigate this issue with its double ckecks, or can it still have troubles recovering from data corruption when data is actually readable but it's wrong somewhere, especially as RAID 6 arrays tend to have lots of disks?

This could theoretically be solved by checksums, to ensure which copy of the data (or parity) is the correct one; but does any RAID controller actually implement this kind of checksum (which would of course take up additional space)? Or does it need to be handled at the OS level, where most filesystems can and will checksum their contents? And if this is the case, how can they tell the RAID controller "data on sector X on disk Y on stripe Z is wrong", when the general approach of a RAID controller is to abstract the OS from the underlying storage layer as much as possible?

Massimo
  • 68,714
  • 56
  • 196
  • 319
  • This is what the "Patrol Read" or a background consistency check is for. – ewwhite May 24 '16 at 23:49
  • 2
    That's useful for early detection of *bad blocks* and moving data somewhere else before an actual error occurs. But it still has to deal with *readable but inconsistent* data. Take my RAID-1 example: if a block on a disk is readable and says "0", while the same block on the other disk is *also* readable and says "1", how can the controller know which one is right? – Massimo May 25 '16 at 00:03
  • Since RAID 1 offers no parity, the system will have a very hard time detecting and correcting the issue. You would probably have to pull the drives and read them individually to get the corrupted file. – Brian D. Nov 25 '16 at 17:06
  • Easy solution - use ZFS – Patrick Jul 25 '19 at 23:40

2 Answers2

3
RAID VOLUMES WITH PARITY STRIPE

On the Areca controllers we use (and all modern hardware RAID controllers) during a consistency check the controller can detect if the corruption is with the parity data, the physical data on disk, or both. Most controllers accomplish this with simple checksum bits for on the parity data and data-on-disk.

In the case of the parity data being corrupted, the controller will notice the issue when you run a consistency check and re-read the physical disk for the correct bits and re-write the parity stripe. Users will see no problems because they are reading data-on-disk when opening the files. Resaving anything that causes the corrupted parity stripe to be re-written will also fix the issue.

If you have the opposite occur, and a bit flips in your actual data-on-disk, then your controller will look at the parity stripe during a consistency check to see if it has changed. In this case the controller will overwrite the data on the disk to match the parity data, which it can confirm is unchanged/good. Users will get a CRC error, or a corrupted file depending on what the data is until a consistency check is run and corrects the error.

Since parity data for specific data-on-disk is never stored on the same drive as the actual data, a single drive failure shouldn't cause any data corruption issues. Or two disks for RAID6, etc.

Consistency checks keep your data accurate as possible and if you let corrupted data sit on your volume for long enough it could get written into parity data, meaning the file is corrupted for good and will need to be restored from a backup. If a drive is in a pre-fail state where it is showing errors during consistency checks replace the drive immediately instead of waiting for the controller to mark it as failed. We run consistency checks daily on smaller volumes and weekly on larger ones.

RAID VOLUMES WITHOUT PARITY STRIPE (EX. RAID1)

The hard drive controller/firmware may be able to correct the issue. If this is not possible the RAID controller will have a very hard time fixing the issue. In this case you would probably have to read the drives individually to recover the data.

GENERALLY SPEAKING

Run consistency checks at the interval recommended by your RAID card mfg. If you are really worried about corruption, you can also stack a resilient file system over a RAID volume. Modern resilient file systems can correct many of these data integrity issues and stacking a resilient FS over RAID6 would offer you excellent data uptime, without corruption. And even with 2 simultaneous drive failures you would still have FS parity data available to avoid presenting corrupted data to the user.

Brian D.
  • 469
  • 3
  • 11
2

You effectively describe the situation, where one disk writes (or reads) an error. The RAID controller has no practical (e.g. write and read-back would kill your performance) way to protect against this situation. It has to rely on the disks being able to detect this kind of error and either use a different block or bail out of the volume - causing a degradation of the RAID.

If you think about the single-disk situation, the only protection against inconsistent writes (or reads) is the disk itself. RAID builds upon that, but does not introduce an additional safeguard.

N.B. I know from experience that XFS reacts quite sensible to erroneous disks in an array. So at least my non-low-end controllers and the OS did recognize but not protect against that inconsistency (a known to be faulty disk was added forcefully to a volume).

Michael
  • 280
  • 3
  • 15