8

During planning my RAID setup on a Synology Disk Station I've done a lot of reading about various RAID types, being this a great reading: RAID levels and the importance of URE (Unrecoverable Read Error).

However, one thing remains unclear to me:

Let's have two scenarios:

  1. An array is a RAID 1 of 2 drives
  2. An array is a RAID 5 of 3 drives

The same assumptions for both scenarios:

  • Let's have 100.000 files on the RAID array
  • One drive fails (needs replacement)
  • There happens to be one bad sector (URE) during rebuilding the array

What happens? Does the RAID rebuild with 99.999 files doing fine and 1 file lost? Or am I going to lose all 100.000 files?

If the answer requires the knowledge of the filesystem type, let assume it's BTRFS or ZFS being the filesystem.

Braiam
  • 622
  • 4
  • 23
adamsfamily
  • 245
  • 2
  • 9
  • the logical answer is: it depends. Raid 1 is a direct copy of another drive. Raid 5 requires at least 3 drives to work, where Raid 1 only needs 2 but with the fact that you are losing capacity. And it depends on what's the error is. In the case of ZFS, it may be a better chance of getting a correct file again. However, the raid will never be the solution for not taking any backups. – djdomi Jun 27 '21 at 14:45
  • You may want to distinguish these failure modes: 1. a sector is unreadable and unwriteable; 2. a sector is unreadable, but it can be overwritten, and then it is readable again. – pts Jun 28 '21 at 09:00
  • 1
    *What happens? Does the RAID rebuild with 99.999 files doing fine and 1 file lost? Or am I going to lose all 100.000 files?* Either one might happen. That's why you have backups. RAID is not a backup! Just because your files are on a RAID array doesn't make them safe. If someone runs `rm -f -r /all/my/important/files`, they're ***gone*** - from every disk in the RAID array. The only thing RAID does is improve the availability of your data. – Andrew Henle Jun 28 '21 at 10:34
  • @AndrewHenle Can you please elaborate on the 'Either one might happen' part? Thanks – adamsfamily Jun 28 '21 at 13:41
  • You're assuming the read error occurs only in file data. It can happen in filesystem metadata, too. Depending on your filesystem, it's possible that can cause loss of everything stored in the filesystem. Never rely on RAID for data security. All it does is protect your ability to *access* your data against a few types of disk failure. – Andrew Henle Jun 28 '21 at 15:41
  • RAID 5 is absolutely useless for large (ie: contemporary) consumer grade drives - just *don't use it*. End of story. By pure statistics, the overwhelming probability is that you *will* have at least one URE during a rebuild, so it's doomed to fail badly in most cases. Even in enterprise situations, whose drives are generally 10x less likely to develop read errors, it's still a dubious solution. RAID 6 still works, so if you want parity RAID use that instead. – J... Jun 28 '21 at 16:42
  • Dupe : [If a RAID5 system experiences a URE during rebuild, is all the data lost?](https://serverfault.com/q/937547/221656) – J... Jun 28 '21 at 16:45

1 Answers1

13

The short answer is that it depends.

In the situation you describe (a faulty disk + some unreadable sectors on another disk) some enterprise RAID controllers will nuke the entire array on the grounds that its integrity is compromised and so the only safe action is to restore from backup.

Some other controllers (most notably from LSI) will instead puncture the array, marking some LBAs as unreadable but continuing with the rebuild. If the unreadable LBAs are on free space effectively no real data is lost, so this is the best scenario. If they affect already written data, some information (hopefully of little value) is inevitably lost.

Linux MDADM is very versatile, with the latest versions having a dedicated "remap area" for such a punctured array. Moreover one can always use dd or ddrescue to first copy the drive with unreadable sectors to a new disk and the use that disk to re-assemble the array (with some data loss of course).

BTRFS and ZFS, by the virtue of being more integrated with the block allocation layer, can detect if lost data are on empty or allocated space, with detailed reporting of the affected files.

psmears
  • 330
  • 1
  • 6
shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 1
    I once had to get a crucial file back from a six-disc RAID-0 array, with two failed drives, under Solstice Disk Suite. I found that `ufsdump` would still read the data, but stop each time it got to a block it couldn't read, and ask if it should continue. `yes | ufsdump` gave me a datastream I could pipe into `ufsrestore`, and since my crucial file was much smaller than the RAID stripe size I figured I had a ~5/6 chance of getting my file back. Which I did, leading to great rejoicing among the developers - ah, good times! – MadHatter Jun 28 '21 at 11:35
  • 2
    The problem with punctures is that you have no easy way of knowing whether important data has been damaged or not without doing a full integrity check, and you can't do that unless you have a good backup to verify it against, and if you've got a good backup to verify it against, then there's no good reason to do an integrity check when you could just restore from the backup and be done with it. Saves a lot of checking. That's why the enterprise controllers consider the whole thing toast for a puncture. – J... Jun 28 '21 at 12:19
  • @shodanshok Would it make sense to implement a RAID with 2 redundant copies (3 drives containing exactly the same data) so that if one drive dies and the other two have a few bad sectors it's statistically almost impossible that those bad sectors would overlap therefore the reliability of such setup would be 99.99999%+ ? – adamsfamily Jun 28 '21 at 14:12
  • 1
    Yes, it's called RAID 6. – Anton Tykhyy Jun 28 '21 at 16:19
  • @J... a punctured array is not a good situation for sure. However you *can* detect which data are affected: as the punctured block is effectively unreadable, you can simply read/dump your important data and check for any copy error. If you does not see any such errors, your data is ok. Alternatively one can read the entire array, check the LBA of the first read error and, from here, identify the affected file. This is a convoluted work indeed... – shodanshok Jun 28 '21 at 16:27
  • 1
    @adamsfamily what you describe is 3-way RAID1. It is perfectly doable both with Linux MDRAID and ZFS, but not all hardware controller supports it due to the very high space penalty (only 33% of space is user-available). If dealing with parity RAID, you need RAID6 for double-redundancy. – shodanshok Jun 28 '21 at 16:32
  • @shodanshok Yes, it is a lot of work. Easier to just restore from the backups. The risk of not doing this is that you miss the data loss due to the puncture and then your bad data ends up displacing the good data in your backup chain. Unless you have no backups at all (seriously?!), there's rarely a good reason to resort to such low level data recovery techniques. – J... Jun 28 '21 at 16:36
  • @J... as many things, it depends: a unreadable block encountered during a backup should immediately alert the sysadmin because the backup itself would be incomplete (rather than corrupted). Moreover if you can be certain that the puncture is on free space, no data loss occurred. Finally, you should consider that restoring from backup comes with its own risks. Again, I am not arguing that having a punctured array is a good thing; it is a bad situation indeed. But I also dislike hardware vendors that totally nuke a 100TB array because a small 512B sectors went bad at the wrong time... – shodanshok Jun 28 '21 at 17:10
  • @shodanshok I dislike sysadmins that don't consider the risks of running a 100TB array in RAID5 and then run crying when their data is all corrupt... – J... Jun 28 '21 at 17:21
  • @J... I am not advocating large RAID5 arrays. I only stress that I prefer a flexible RAID implementation, where I can take the most appropriate action based on the environment, rather than being forced to sudden cope with a nuked and inaccessible array, maybe during work hours... But hey - feel free to disagree ;) – shodanshok Jun 28 '21 at 17:57
  • @shodanshok Of course - I only mean to say that if you woke up one morning with a puncture in a 100TB RAID5 array, the best thing you could possibly do would be to restore from backups onto RAID6 and be done with it. Maybe you can patch that puncture, but it only kicks the can down the road because it's going to happen again and it could be worse next time. – J... Jun 28 '21 at 18:00
  • @AntonTykhyy Responding to: *Yes, it's called RAID 6.* - Ok, just to clarify: in RAID6 after having one disk failure and then a bad sector on the second drive, will the array be able to happily rebuild because it will source the bad sector from another drive? Referring to this reading: https://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ – adamsfamily Jun 29 '21 at 08:35
  • @shodanshok Responding to: *What you describe is 3-way RAID1. It is perfectly doable both with Linux MDRAID and ZFS* - this is a great news! I have one strong argument against "It is better to recover from a backup" - I would argue that it is not because if you have a living system running 24/7 (almost everything nowadays) then recovering from a backup means downtime and that's terrible. I think it's far better to live with the 33/66% penalty of 3-way RAID1 which will pay off with satisfied customers and zero-downtime service. I just need to figure out if that's possible on a Synology DS. – adamsfamily Jun 29 '21 at 08:37
  • @shodanshok Looks like Synology (using Linux MDADM) is capable of 3-way and also 4-way RAID1 which could mean that by replacing 1 HDD every year the system could run almost forever. (https://www.synology.com/en-global/company/news/article/Synology_Showcases_Full_featured_NAS_Server_the_DiskStation_and_its_System_Software_DiskStation_Manager_2_3_at_DSE_2010/Synology) – adamsfamily Jun 29 '21 at 08:41
  • Yes, it will. Personal experience. – Anton Tykhyy Jun 29 '21 at 08:41
  • @adamsfamily `recovering from a backup means downtime` If you have a puncture **you are down already**, and it's not safe to continue using the array until you can verify its integrity. And yes, RAID6 can tolerate one failed drive and any number of read errors as long as two of the remaining drives do not suffer errors on the exact same sector. RAID5 is absolute garbage in 2021, and it's not the right solution in almost all circumstances. – J... Jun 29 '21 at 12:14
  • @J... - *If you have a puncture you are down already* - If I have a 3-way RAID 1 and one disk fails and the other have bad sectors here or there, as long as they don't overlap, I should be fine with zero downtime. I can then replace all disks that are faulty one by one and still remain 100% of the time up and running. Am I missing anything? – adamsfamily Jun 29 '21 at 18:21
  • @adamsfamily The only thing you're missing is that the situation you've described is *not a puncture*. It's a *degraded array*, meaning it has lost some or all of its redundancy but the remaining data is still intact. A RAID5 array with a bad disk is not punctured - it is degraded. If, during the rebuild, you encounter a URE, **then** your array is punctured (ie: not all data could be preserved). – J... Jun 29 '21 at 18:28
  • @adamsfamily I'm suggesting that *in the case of having a punctured RAID5*, recovering from the backup is the better option, but specifically recovering to a *different RAID level* as it gives you the opportunity to ditch RAID5 and restore your backup to a saner architecture that is not so prone to punctures in the future (ie: RAID6, 3-way RAID1, etc). RAID6 is more efficient with space but is slower, less performant. Either would be superior to RAID5, which lacks sufficient redundancy to actually work as intended. – J... Jun 29 '21 at 18:33
  • If your data is truly critical, consider a 4-or-more disk raid-1 array. Lazy hardware "raid" controllers often don't support raid-1 arrays larger than 2 disks, though I can't think of a serious reason why not. But TRUE raid, done in the linux kernel with mdadm easily and effortlessly supports it. Also, if you're using btrfs, you should read up about how btrfs implements raid-like techniques and duplication at the volume level. Basically one btrfs filesystem can span multiple block devices. And really neatly, it can do two-copy raid1 over THREE disks, rotating which disks carry which files – Billy left SE for Codidact Jun 30 '21 at 03:43
  • @J... fair point, agreed. – adamsfamily Jun 30 '21 at 07:23
  • @BillyC. Interesting, never heard of this capability of `btrfs`, being able to substitute RAID. – adamsfamily Jun 30 '21 at 07:24