6

Possible Duplicate:
Which is better: RAID5 + 1 Hotspare / RAID6?

I need to decide myself between RAID5 and RAID6.

The servers have a hardware-RAID controller and 6 drives each.

The drives are RE3 enterprise western digital 1TB drives. The data sheet says MTTF = 1.2Mio hours, Bit error rate = 1/10^15

On another Server there are even 6 Seagate SAS Drives (172GB each) with MTTF = 1.6Mio hours, Bit error rate = 1/10^16.

When doing the Math I get quite comfortable numbers for this setup (about 110 years to data-loss) with the SAS-drives even more. However this uses the manufacturer data. Is this realistic? Here are the formulas (on the last slides, it's in german - sorry: http://www.heinlein-support.de/sites/default/files/RAID-Mathematik_fuer_Admins.pdf

I've also found: http://blog.kj.stillabower.net/?p=37 - well these graph suggest that 6 drives can work, but for anything important one should resort to RAID6. This data however is older and also includes consumer drives?

So, any real world data on this? I see that using more than 8-9 disks is problematic. However it looks like 6 enterprise disks are still fine.

So what to do? RAID-5 or RAID-6?

kei1aeh5quahQu4U
  • 445
  • 4
  • 22
  • Don't forget to factor in a hot spare. – John Gardeniers Jan 10 '13 at 02:07
  • I did not want to use a hotspare, then I also could use RAID6 directly. Having a hotspare does also not help me with the potential rebuild bit-errors or does it? – kei1aeh5quahQu4U Jan 10 '13 at 02:09
  • 1
    MTTF or MTBF numbers are marketing BS meant to give you an over-confident trust in that drive. Based on real world servers under pampered conditions in a nice data centre, 1 in 5 drives will fail before it reaches 3 years old. Also, if less than 5% of your HDDs fail annually (AFR) you are doing ok in terms of looking after the disks, 2% is about the best you can hope for. I've looked after a bunch of servers totalling about 150~170 HDDs in some dusty corrosive remote environments over the last few years with about a 5~6% AFR and seeing a 2nd disk fail within hours of the 1st is quite common. – BeowulfNode42 Oct 09 '16 at 00:44

1 Answers1

5

You want to go with RAID-6. The problem with RAID-5 and very large drives is that when you have a failure and have to rebuild the failed drive you now MUST be able to read every byte from the remaining drives. If you have a 7+1 (1 TB drive) RAID-5 set, this means that you need to accurately read 7 TB of data to rebuild the failed drive. I have personally experienced data loss during such a rebuild as undetected bad spots on the remaining drives are discovered during the rebuild.

HeatfanJohn
  • 366
  • 5
  • 14
  • 2
    RAID-5 is a gambler's RAID, yes. The larger the volume, the more fire you're playing with. You also run increased risk of another drive failing due to all the activity during the rebuild, and at that point you'll have lost everything. – Andrew B Jan 10 '13 at 03:17
  • 2
    With a bit error rate of 1e15, your chance of experiencing a bit error causing a failure during a rebuild is mathematically 0.7%. But since you probably bought all your drives at the same time and from the same batch, the true likelihood of this failure happening to you is much higher. That's why we don't do RAID 5 with large hard drives. – Michael Hampton Jan 10 '13 at 03:42
  • 2
    Offtopic but have you considered RAID-Z. – Sameer Jan 10 '13 at 04:17