26

I understand the argument regarding larger drives' increased likelihood of experiencing a URE during a rebuild, however I'm not sure what the actual implications are for this. This answer says that the entire rebuild fails, but does this mean that all the data is inaccessible? Why would that be? Surely a single URE from a single sector on the drive would only impact the data related to a few files, at most. Wouldn't the array still be rebuilt, just with some minor corruption to a few files?

(I'm specifically interested in ZFS's implementation of RAID5 here, but the logic seems the same for any RAID5 implementation.)

mboratko
  • 389
  • 3
  • 8
  • 1
    In general, when "likelihood of experiencing a URE _during a rebuild_" is discussed in the context of RAID5 risks, the implied assumption is that an earlier corruption has already occurred to cause the rebuild to be necessary. In other words, the "URE during rebuild" is the _second_ URE, and indeed ALL data will be lost. – Colt Oct 28 '18 at 10:18
  • 2
    @Colt - I understand that's the implication, but what I don't understand is why a single URE (which, in the analysis of why RAID5 isn't recommended, seems to refer to a bad sector) would mean that *all* the data would be lost. In general, if I have lost 1 drive of a RAID5 array then I still have all the data. If I additionally lose a single sector from any of the remaining drives then it is *possible* that I lost data which was stored in that sector, but if that sector was (for example) free space then I don't care, and if that sector did have data on it then it may only impact a few files. – mboratko Oct 28 '18 at 13:54
  • @Colt - Based on the answers below, it seems like failing to rebuild the array in the presense of a single URE was a choice made by hardware RAID manufacturers. In my opinion, this was the wrong choice, but thankfully it seems ZFS does it differently. – mboratko Oct 28 '18 at 13:55
  • See @shodanshok's answer for the process. As to the why, RAID is for providing _continuity_ of access to _reliable_ data for other processes, applications, etc., and is not about backup. The reason that many (most?) hardware controllers abort once the URE occurs in rebuild is that the RAID can no longer do _what it is supposed to do_. At this point, the backups _need_ to be used to have reliable data. Another way to use RAID is to not do any rebuild at all, but just use RAID to control timing of recovery from backup. Also, it allows time to make the _final_ backup before recovery. – Colt Oct 28 '18 at 15:37
  • Note that “ZFS’ implementation of RAID5” is called “raidz” or “zraid” and is different from hardware RAID5. You’ll typically get better answers about “ZFS RAID5” asking about “raidz” – Josh Oct 28 '18 at 15:52
  • @Josh - thanks, I will do that in the future. In this case, I actually *was* interested in hardware RAID5 as well, since it also seems like hardware RAID should be able to similarly recover. – mboratko Oct 28 '18 at 16:17
  • Cool. Trying to help you get the best answer possible @process91 :) hardware raid has less capabilities in this regard (see shondanshok’s answer for details) – Josh Oct 28 '18 at 16:29
  • @Colt Yes but this seems like the RAID controller is saying "Well, since I can't guarantee that this sector of data is correct I'm going to make it so you can't access *any* of your data! That'll teach you!" Even if this was the motivation, wouldn't a better process be to alert the user, continue the rebuild, and perhaps only allow read operations? – mboratko Oct 28 '18 at 18:39
  • Perhaps, but as is spewed all over the Internet, RAID is NOT a backup system. It is a system with the single purpose of trying to make sure that a single (for RAID5) disk failure won't make your data immediately inaccessible. As a secondary feature, there is capability to attempt to allow in-place (further continuity) replacement of the failed disk. If this fails, however, it is NOT the job of the RAID system to go farther. This is why many also say that the first thing to do, prior to even attempting a rebuild, is to try to get a current backup (in addition to the one you should have). – Colt Oct 28 '18 at 19:05
  • I have heard that mantra repeatedly, and it scared me away from using ZFS and RAIDZ as a backup system, but it kept nagging at me. With my current understanding, I am totally convinced that RAIDZ with snapshotting makes for an *excellent* backup system. The arguments about not using RAID as a backup and never using RAID5 apply to hardware RAID only, which I stopped using the second I had a card die on me. It's bit off-topic for this question, so I'll post something else about this with more detail later. – mboratko Oct 28 '18 at 19:25

4 Answers4

27

It really depends on the specific RAID implementation:

  • most hardware RAID will abort the reconstruction and some will also mark the array as failed, bringing it down. The rationale is that if an URE happens during a RAID5 rebuild it means some data are lost, so it is better to completely stop the array rather that risking silent data corruption. Note: some hardware RAID (mainly LSI based) will instead puncture the array, allowing the rebuild to proceed while marking the affected sector as unreadable (similar to how Linux software RAID behaves).

  • linux software RAID can be instructed to a) stop the array rebuild (the only behavior of "ancient" MDRAID/kernels builds) or b) continue with the rebuild process marking some LBA as bad/inaccessible. The rationale is that it is better to let the user do his choice: after all, a single URE can be on free space, not affecting data at all (or affecting only unimportant files);

  • ZRAID will show some file as corrupted, but it will continue with the rebuild process (see here for an example). Again, the rationale is that it is better to continue and report back to the user, enabling him to make an informed choice.

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • @process91 Just to elaborate a bit further. If the RAID implementation doesn't have the additional data structures needed to mark individual sectors as bad, it has to either fail the rebuild or introduce silent corruption. Marking individual sectors as bad is better, but could still put other sectors at risk due to those sharing a parity sector with the bad sector. – kasperd Oct 28 '18 at 18:16
  • @kasperd Sure, I guess I assumed most RAID implementations had the capability to alert the user to bad sectors. I understand if there is a bad sector in one drive that will lead to an incorrect sector in the new drive after a rebuild. That said, even if the RAID implementation did nothing more than alert the user "I have rebuilt the drive as best as I could, but I experienced 1 URE in the process" and then continued to allow attempted writes to that sector I don't see how *other* sectors could be at risk. The only possible incorrect sectors would be the original, the new one, and the parity. – mboratko Oct 28 '18 at 18:41
  • One clarification, based on @Colt 's comments above - in the case of hardware RAID, when it marks the array as *failed* does it still allow access to the data at all? Even, say, read-only access for the purposes of attempted recovery? – mboratko Oct 28 '18 at 18:45
  • @process91 Allowing a sector to get corrupted is not considered a good idea, even if that fact was recorded to a log file. You'd have no idea which file might be corrupted. The RAID would have to ensure upon reading that file you get an error. Also clearly you don't want to just overwrite the bad sector, because that would mean you just lost your last chance of recovering the data. So you have an unreadable sector on one disk and a sector on the new disk where you don't know what to write. That could be two different files corrupted. – kasperd Oct 28 '18 at 18:46
  • @process91 The rest of the sectors contributing to the same parity sector would be at risk, because you cannot recover any of those from parity if they were to get lost at a later date. All of this can be addressed, but it requires more complicated data structures. And sure those can probably still be cheaper than taking the step to RAID6. But probably the step to RAID6 is worth it as it reduces the risk of this situation enormously. – kasperd Oct 28 '18 at 18:50
  • @kasperd Yes, sure, two different files perhaps, but still the only sectors which possibly have bad information is the original one which suffered a URE, the new one, and the parity bit, right? I agree, the other sectors are at risk if you had to rebuild *again*, and I'm not suggesting that is a good idea. I'm just saying it would be nice to have your array rebuild once so you can potentially get the OK data off of it. All your data minus 2 files is a lot better than none of your data. – mboratko Oct 28 '18 at 18:51
  • 1
    @process91 I added a note about LSI-based arrays. Give it a look. – shodanshok Oct 28 '18 at 19:53
8

If URE will happen you'll experience some data corruption over the block which is typically 256KB-1MB in size, but this doesn't mean ALL the data on your volume would be lost. What's not so great about RAID5 is a totally different thing: Rebuild itself is stressful and there're high chances you'll get second disk failure in a row. In such a case all the data would be lost.

BaronSamedi1958
  • 12,510
  • 1
  • 20
  • 46
  • 2
    How is a RAID5 rebuild more stressful on a single drive than a RAID1 rebuild? I see that it is more stressful on the CPU, but for any specific drive we are simply reading all the data off it. Normally, the danger people cite with larger drives is that they will likely encounter a URE during the rebuild, but that's fine with me if it just means a single sector will be corrupted. – mboratko Oct 28 '18 at 10:46
  • 3
    It's probability theory. With N (where it's # of drives) your chances to have failure are N times higher. – BaronSamedi1958 Oct 28 '18 at 15:07
  • 1
    That's not quite how the calculation would work, you'd actually want to calculate 1- probability of *not* having a failure, but I understand that part. It seems I've incorrectly interpreted your statement as suggesting that the act of rebuilding a RAID5 is somehow more stressful on the disk itself (which I've read elsewhere) which therefore increases the chance of a URE, but if that's not what you're saying then I agree. – mboratko Oct 28 '18 at 16:25
2

I would explain it the other way around;

If the RAID controller don’t stop on URE, what could happen ?

I lived it on a server, the RAID never noticed the URE and after the rebuild a corruption started to build up on the entire RAID volume.

The disk started to get more bad sector after the rebuild and the data started to be corrupt.

The disk was never kicked off the RAID volume, the controller fail is job to protect the data integrity.

That example is wrote to make you think that a controller can’t thrust a volume with URE at all, its for the data integrity, as the volume is not meant to be a backup but a resiliance to a disk failure

yagmoth555
  • 16,300
  • 4
  • 26
  • 48
  • 1
    I see the new moderators are all constantly checking the site, looking for things to do... – Ward - Reinstate Monica Oct 28 '18 at 02:28
  • @Ward haha, yeah :) – yagmoth555 Oct 28 '18 at 02:32
  • 1
    Why would a single URE build up corruption in the entire RAID volume? – mboratko Oct 28 '18 at 10:35
  • 2
    Sorry, I reread your answer. It sounds like you had a single bad URE during the rebuild, but this wasn't the problem. The problem was that sectors continued to go bad after the rebuild, and the drive never reported it. This seems like a separate issue, however, from whether or not the RAID controller notices a URE during a rebuild. The RAID controller could notice the URE during rebuild and alert you to it but still proceed to finish the rebuild. Some data would always be better than no data. – mboratko Oct 28 '18 at 10:54
  • @process91 sadly URE rarely come alone, but in my case the damaged data was inside the Active Directory store. The sync between AD server make it worst after. We could have a debate that a damaged data is better or not, but you should have a backup to restore. – yagmoth555 Oct 28 '18 at 11:01
  • 2
    I'm only interested in analyzing why RAID5 was deemed as "dead" in 2009, which rests on the likelihood of a single URE. My understanding now is that this analysis was both mathematically incorrect and doesn't really apply in the same way to, for example, ZFS. – mboratko Oct 28 '18 at 11:05
  • I'd simply suggest that the main design goal of RAID 5 were reliable data integrity above all else. As you should have a backup anyway, it's better to stop than risk ongoing silent corruption of data. There might be some debate about whether or not this is the right decision but I think the design choices make sense if you consider the idea that any possibility of corruption is considered much worse than the drive array not being available. – Rob Moir Oct 28 '18 at 18:17
  • 1
    @RobMoir I guess your last statement is where I disagree. Getting almost all my data off the array could be useful, even if I had another backup. Maybe that file was not important, or (in the case of hardware RAID) the error occurred in an area of free space. I think the right decision, for hardware RAID (where it doesn't know specifically what files were affected) would be to alert the user, complete the rebuild, and flip the array into read-only mode. I don't see any downsides to this. (Obviously, filesystems such as ZFS can even do better, since they can report the affected files.) – mboratko Oct 28 '18 at 18:59
  • @process91 I understand your idea, but imagine a sql database cluster, that bad data if inside the db, you realize that the bad data will finish replicated on healthy server ? its like my example, the ADDS database was hit, and it replicated to a healthy server. – yagmoth555 Oct 28 '18 at 21:06
  • I agree with your point, although in practice I think most such systems would crash if the underlying media became read-only. Still, I think the idea that a RAID system could become unrecoverable due to a bad sector in part of the drive that may not even be used is ridiculous, and there are better solutions available. Some method should be available, once the array is taken offline, to put it in a read-only mode and try to restore whatever can be. – mboratko Oct 28 '18 at 21:30
1

I'd suggest reading this question and answers for a bit more background. Then go and re-read the question you linked to again.

When someone says about this situation that "the RAID failed," it means you lost the benefit of the RAID - you lost the continuous access to data that was the reason you set up the RAID array in the first place.

You haven't lost all the data, but the most common way to recover from one dead drive plus (some) UREs on (some of) the remaining drives would be to completely rebuild the array from scratch, which will mean restoring all your data from backup.

Ward - Reinstate Monica
  • 12,788
  • 28
  • 44
  • 59
  • 1
    Generally, you use RAID when your goal is to minimize downtime. Having the array keep going with unknown and unrepaired corruption is usually counter to that goal. – David Schwartz Oct 28 '18 at 03:19
  • 1
    Thanks, that first question you linked to was very informative. Why would I have lost continuous access to the data? The array would still be up during the rebuild, and if it encounters a URE during the rebuild then I would expect it to just keep going, albeit with this one sector of data now corrupted. Is this not the case? – mboratko Oct 28 '18 at 10:45