20

I have been looking into RAID5 Vs RAID6 lately and I keep seeing that RAID5 is not secure enough anymore because of the URE ratings and increasing size of the drives. Basically, most of the content I found says that in RAID5, in case you have a disk failure, if the rest of your array is 12TB, then you have almost 100% chance to meet a URE and to lose your data.

The 12TB figure comes from the fact that disks are rated at 10^14 bits read to reach one URE.

Well, there is something I do not get here. A read is done by the head going on the sector, what can make the reading failed is either the head dies or the sector dies. it can also be that the reading does not work for some other reason (I don't know, like a vibration made the head jumps...). so, let me address all 3 situations :

  • the reading does not work : that is not unrecoverable, right? it can be tried again.
  • the head dies : this would for sure be unrecoverable, but, that also means the full platter (or at least the side) would be unreadable, it would be more alarming, no?
  • the sector dies : as well totally unrecoverable, but here I do not understand why the 4TB disk is rated at 10^14 for the URE and the 8TB is as well rated at 10^14 for the URE, that would mean the sectors on the 8TB (most likely newer tech) are half as reliable as the ones on the 4TB, that does not make sense.

As you see, from the 3 failure points I identify, none makes sense. So what exactly is an URE, I mean concretely?

Is there somebody who can explain that to me?

Edit 1

After first wave of answers, it seems the reason is the sector failing. Good thing is that firmware, RAID controller and OS + filesystem have procedure in place to early detect that and reallocate sectors.

Well, I now know what is a URE (actually, the name is quite self-explanatory :) ).

I am still puzzled by the underlying causes and mostly the stable rating they give.

Some attributed the failing sector to external sources (cosmic waves), I am then surprised that the URE rate is then based on the reading count and not on the age, the cosmic waves should indeed impact more an older disk simply because it has been exposed more, I think this is more of a fantasy though I might be wrong.

Now comes the other reason that relates to the wear of the disk and some pointed out that higher densities give weaker magnetic domains, that totally makes sense and I would follow the explanation. But As it is nicely explained here, the newer disks different sizes are obtained mostly by putting more or less of the same platter (and then same density) in the HDD chassis. The sectors are the same and all should have the very same reliability, so bigger disks should then have a higher rating than smaller disks, the sectors being read less, this is not the case, Why? That would though explain why the newer disks with newer tech get no better rating than the old ones, simply because the better tech gain is offseted by the loss due to higher density.

Memes
  • 368
  • 2
  • 3
  • 10
  • "URE and to lose your data" afaik (and i may be wrong), a URE means only that some data is lost, not all of it - and you can try the rebuild again after hitting the URE. That said, raid 10 or zfs is kinda where it's at these days. – Sirex Nov 03 '16 at 06:56
  • 1
    "*sectors [on newer discs] are half as reliable as [on the old], that does not make sense*" I'm not sure I agree. As the magnetic zones become ever smaller (which higher data densities in the same-size package implies), it's very reasonable that they become ever more susceptible to accidental erasure (local gamma-ray emissions, cosmic ray event, and so on). This increasing susceptibility of modern drives is why none of us would deploy un-RAIDed drives in anything that matters, and one reason why most of us have given up on RAID-5. – MadHatter Nov 03 '16 at 07:58
  • Related: [How reliable are current 2 TByte consumer grade disk drives?](https://serverfault.com/q/590183/58408) – user Nov 03 '16 at 10:09
  • 1
    The real problem here is that far too many RAID arrays are turning a single URE into a whole-array error. A single URE should cause the loss of a single RAID block. Let the filesystem figure out if that block was even in use, chances are it really doesn't matter. – MSalters Nov 03 '16 at 14:16
  • "that would mean the sectors on the 8TB (most likely newer tech) are half as reliable as the ones on the 4TB, that does not make sense." where are you getting that? The spec is saying that they're the same. – hobbs Nov 03 '16 at 16:34
  • @Sirex from all the calculations I saw, RAID 10 would not be that much safer actually. let's say I have on one side 3x4TB RAID5, 1 disk fails, for the rebuild, I have to read 8TB (or two third of the 12TB figure), I have no failsafe. now let's look at 4x4TB RAID10 (same 8TB total size), in case I have one disk failing, I need to rebuild whichever RAID1 sub-array has failed and the reading would be 4TB, so basically we could say twice as safe, because I also have no failsafe if I meet URE. Actually RAID6 would be the best here (safety wise). no? – Memes Nov 04 '16 at 03:40
  • @MadHatter as far as I read, many newer 2TB disks are actually same platters as the 4TB but just half the number of platters. so, if that is true, then the newer 2TB should be as "bad" as the newer 4TB and we actually should see a URE rating going down for a given size of disks. no? – Memes Nov 04 '16 at 03:43
  • @MadHatter this confirms my comment above http://rml527.blogspot.hk/2010/10/hdd-platter-database-western-digital-35_9792.html – Memes Nov 04 '16 at 03:51
  • @MSalters thanks, yes, it seems to be the key element indeed. – Memes Nov 04 '16 at 03:59
  • @hobbs if you have twice as many sectors, you half half as much read per sector, So per my assumption, the wear is the reason, then the 8TB sector are less reliable. that is where I get that. – Memes Nov 04 '16 at 04:01
  • 1
    @Memes no, the numbers cancel out. Twice as many sectors is also twice as many opportunities for failure, so the same read error rate equals the same reliability on a per-byte basis. Which is why it's used in the first place. – hobbs Nov 04 '16 at 04:55

4 Answers4

13

A URE is an Unrecoverable Read Error. Something has happened that has caused the reading of a sector to fail that the drive cannot fix. The drive electronics are sophisticated, they will only pass the data up if they have been able to read it correctly from the disk. The drive electronics will try multiple times to read a bad sector before declaring it damaged.

What causes the read error - I'm not an expert here (arm waving ensues) but drive aging can cause manufacturing tolerances to become relevant. Magnetic domains can become weakened. Cosmic rays can cause damage etc. Essentially it is a random failure.

How does this affect RAID 5?

A RAID 5 consists of block level striping with distributed parity. The parity blocks are calculated by XORing the bits from the data blocks together. The XOR function basically says, if all the bits are the same the result is 0 otherwise it is 1. When calculating parity you take the first 2 bits and XOR them then XOR the result with the next bit and so on e.g.

1010   data      or    1010 data
1100   data            1100 data
0110   parity          0011 data
                       0101 parity

The nature of the XOR function is such that if any disk dies and is replaced, the data that should be on it can be reconstructed from the remaining disks.

1010  data       or    1010 data
      damaged               damaged
0101  parity           0011 data
                       0101 parity

As you can see the damaged data can be reconstructed by XORing the remaining data and parity.

How does a URE affect this?

A URE is only significant during a RAID 5 rebuild.

When you reconstruct a RAID 5 there is a large amount of reading to be done. Every data block needs to be read in order to reconstruct the data on the new disk. If a URE occurs then the data for the relevant block cannot be recovered so your data is inconsistent. For sufficiently large disks in a sufficiently large R5 the number of bits read to reconstruct the replaced disk exceeds the URE value of for example 1 bit in 10^14 read.

user9517
  • 114,104
  • 20
  • 206
  • 289
  • 2
    A *single* 8TB disc has over 6*10^13 bits on, so with merely three such discs in a RAID-5, a URE is *more likely than not* during a reconstruct. Oh, and +1 from me. – MadHatter Nov 03 '16 at 08:30
  • URE rates are quoted in *full sectors* per bits read (or its inverse). So if the disk uses 4,096-byte sectors, that single URE botches the whole sector. – user Nov 03 '16 at 09:54
  • Yes, that's why I said `Something has happened that has caused the reading of a sector to fail that the drive cannot fix` – user9517 Nov 03 '16 at 11:22
  • sorry, it still does not make sense to me. if the reason is the cosmic rays, then, it does not matter how many reads you do but mostly how old are the drives, because older drives have more chances to have been exposed, no? concerning the weakened magnetic domains, here again, I question the cause, if it is external, then having twice as many makes the chance to catch the trigger higher, bigger drives should have lower ratings. if it is the wear of the domains by its readings, then bigger drives should have higher rating as they are statistically read less. – Memes Nov 04 '16 at 03:48
  • ok, now I see that if it is a mix, that might very well balance out :) – Memes Nov 04 '16 at 03:49
  • 4
    The claim (written in the question and in some answers and comments, also in other questions, in fact all over the internet) that after reading 12TB a read error is almost certain is false. Don't believe it? Don't. Know it. By reading 12 (or more) TB from any of your disks and observing that no error happened. Please do it and stop this myth. Thank you. – David Balažic Oct 02 '17 at 17:45
  • @DavidBalažic Since 2016 most consumer disks have been uprated to a URE of 10^15, which means you get 125TB of reads before a URE occurs - more than sufficient for most users. But if you have disks that are only rated to 10^14, then it's almost certain that you will encounter a URE after reading 12.5TB. You might get lucky and have no URE, but when it comes to data integrity, one doesn't rely on luck. – Ian Kemp Apr 15 '19 at 08:52
  • 1
    @IanKemp No it isn't. I tried it. You obviously didn't. (also, the better rating just moves the myth a bit, no real change) – David Balažic Apr 15 '19 at 17:51
  • 1
    @DavidBalažic Evidently, your sample size of **one** invalidates the entirety of probability theory! I suggest you submit a paper to the Nobel Committee. – Ian Kemp Apr 16 '19 at 05:37
  • 1
    @IanKemp If someone claims that all numbers are divisible by 7 and I find ONE that is not, then yes, a single find can invalidate an entire theory. BTW, still not a single person has confirmed the myth in practice (by experiment), did they? Why should they, when belief is more than knowledge... – David Balažic Apr 16 '19 at 12:22
  • @IanKemp after one week still no URE, huh? – David Balažic Apr 22 '19 at 15:20
  • 1
    @DavidBalažic If someone would claim `P("URE in 10^15 reads") = 1`, you could debunk this with one experiment showing you actually did not see one URE while performing 10^15 reads - true. But as a URE is only expected to happen **at most** after 10^x (assuming which probability distribution?), you can't proof anything even with a sample size of one (tbh even bigger sample sizes would not proof a lot more). – Murmel Dec 10 '19 at 15:09
  • @Murmel still waiting for a _one_ confirmed case of this myth ... – David Balažic Dec 14 '19 at 16:50
11

So what exactly is an URE, I mean concretely?

Hard disks do not simply store the data that you ask them to. Because of the ever-decreasing magnetic domain sizes, and the fact that hard disks store data in an analog rather than binary fashion (the hard disk firmware gets an analog signal from the platter, which is translated into a binary signal, and this translation is part of the manufacturer's secret sauce), there is virtually always some degree of error in a read, which must be compensated for.

To ensure that data can be read back, the hard disk also stores forward error correction data along with the data you asked it to store.

Under normal operations, the FEC data is sufficient to correct the errors in the signal that is read back from the platter. The firmware can then reconstruct the original data, and all is well. This is a recoverable read error which is exposed in SMART as the read error rate attribute (SMART attribute 0x01) and/or Hardware ECC Recovered (SMART attribute 0xc3).

If for some reason the signal degrades below a certain point, the FEC data is no longer sufficient to reconstruct the original data. At that point, the theory goes, the firmware will still be able to detect that the data could not be read back reliably, but it can't do anything about it. If multiple such reads fail, the disk has to somehow inform the rest of the computer that the read couldn't be performed successfully. It does so by signalling an unrecoverable read error. This also increases the Reported Uncorrectable Errors (SMART attribute 0xbb) counter.

An unrecoverable read error, or URE, is simply a report that for whatever reason, the payload data plus the FEC data was insufficient to reconstruct the originally stored data.

Keep in mind that URE rates are statistical. You won't encounter any hard disk where you can read exactly 10^14 (or 10^15) - 1 bits successfully and then the next bit fails. Rather, it's a statement by the manufacturer that on average, if you read (say) 10^14 bits, then at some point during that process you will encounter one unreadable sector.

Also, following on the last few words above, keep in mind that URE rates are given in terms of sectors per bits read. Because of how data is stored on the platters, the disk cannot tell which part of a sector is bad, so if a sector fails the FEC check, then the entire sector is considered to be bad.

user
  • 4,267
  • 4
  • 32
  • 70
  • OK, so it seems to point towards the sector failing. I totally get the statistics things, no worry. I also see here that the reliability of the sector goes decreasing as the density goes higher, but that still does not make sense. Newer disks usually have the same platter density no matter the physical size, the 4TB will just have less platters than the 6TB. Basically the sectors are the same, so why the 8TB is not able to achieve statistically a higher value, there are twice as many sectors so each is read half as much (statistically). they should then fail less, no? – Memes Nov 04 '16 at 03:27
3

the sector dies : as well totally unrecoverable, but here I do not understand why the 4TB disk is rated at 10^14 for the URE and the 8TB is as well rated at 10^14 for the URE, that would mean the sectors on the 8TB (most likely newer tech) are half as reliable as the ones on the 4TB, that does not make sense.

The specification is usually "on average 1 error is detected while reading n bits", so the drive size does not matter. It matters if you calculate your risk that an error will happen on your drive and workload, but the manufacturer only states that it takes n bits read to find an error (on average, not guaranteed).

Example: If you buy a 1TB drive, you would have to read it about 12 times to find an error, while an 8TB drive might experience it on the second read - but the number of bits read is the same both times, so the quality of the magnetic spindles is roughly the same.

What you pay for in increased price are other factors, ability to cram 8TB into the physical space of 1TB, greatly reduced energy consumption, fewer headcrashes while moving the drive etc.

user121391
  • 2,452
  • 12
  • 31
0

I think @Michael Kjörling answered clearly.

When the disk read, the head detecting the direction of the magnetic domain, then send out some eletronic signal, which is analog. We assume the firmware should give an 1 when it receive a voltage higher than 0.5V, but the magnetic field is too weak, so the head send a signal with 0.499V only, an error encounterd. We need the FEC to correct this error.

Here's an example: a sector data should be 0x0F23, we encode it with 0*1+F*2+2*3+3*4=0x30. now we get the FEC, and write it after the sector. When we read, we read 0x0E23 and FEC 0x30, it dosen't match. After some calculate, we found it should be 0x0F23. But if we got 0x0E13 and 0x30, OR we got 0x0E23 and 0x32, we can't calculate the correct one.

This rating is so low, maybe unless the hdd manufactory read PBs ever EBs data could get a stable value. So they give out the probability value: when you read 10^14 bit data, you may encountered once. Since it's a probability value, maybe you encoutered after you read just 1 sector data, maybe you encountered until you read 50TB data. And this value had nothing with the disk capacity, it just a chance concern with the data size you read. If you read a 4TB disk full of data 6 time, this chance will equal to read a 6TB disk 4 time, or read a 8TB disk 3 times.

Harley
  • 11
  • 3