Looking for an actual experience of RAID 5 2 drive failure?



I'm wondering if anyone has any personal experience of RAID 5 2 drive failure with large drives?

As I understand it, the theory is that with large 1-2TB drives, if one drive fails in the raid set, it needs to rebuild everything so is thus hitting all the other drives very hard, and the chance of another failure goes up, especially if the drives were from the same manufacturing batch. And if you lose another drive, you lose all the data.

This is usually explained after the statement "RAID is not backup" which I agree with.

The theory of this makes sense, and I understand it, but does it really happen?


Posted 2009-07-23T17:30:40.483

Reputation: 358

Question was closed 2012-11-12T11:42:23.567

Sadly we just got a new question with live experience of this. :( http://superuser.com/questions/516844/degraded-raid5-and-no-md-superblock-on-one-of-remaining-drive

– Hennes – 2012-12-09T03:20:08.270



Yes, I've had it happen to me. A set of 4 (consumer grade) WD 500 drives went bad over the course of about a week. I was slow to replace the first, and didn't take the array offline, and lost all my data when the second failed. I re-used the remaining two good ones, and one of them failed within the next month. They were all properly cooled and cared for. I can only say that I now believe the "bad batch" rhetoric.

In a separate incident, I had 3 separate drives of different makes and models fail within a month of each other, though I'm pretty certain that the reason they failed was due to improper ventilation. Don't cook your drives!

Paul McMillan

Posted 2009-07-23T17:30:40.483

Reputation: 826

3As a corollary, have a spare sitting around for when a drive does go bad. Also, beware of silent corruption... it's easy to lose data on a drive that's only pretending to work. – Paul McMillan – 2009-07-23T19:33:57.050

This is another reason that you should not install drives that are all from the same batch in a RAID array - they have correlated failure times (y'know, like default rates of tranched subprime collateralized mortgage securities). – Andrew Mao – 2014-09-16T17:52:11.677


This has actually happened to me, though, it wasn't really the most common way a drive would fail. I had 4 500gb external sata drives in raid 5. They were attached to a cheap old IBM rack mounted server. The whole setup was tucked away under the stairs and one day, either a rat or a bunny, but something chewed through some power cables and 2 drives were shorted out. All the drives were in cheap external enclosures so I guess i shouldn't have been so surprised.


Posted 2009-07-23T17:30:40.483

Reputation: 397


Are you asking if you can lose 2 drives back to back? Sure, anything can happen. Raid 5 allows for great availablity and performance increase for data access, but raid 5 does not back up anything. It just simply helps prevent use of your data due to a single drive hardware loss. It is not a copy of your data. You can't recover an old copy, an old revision, or simply a copy of your current work. Also, does not protect against data corruption. There are more things that could go wrong than just simply losing a drive. Virus could corrupt all your data, little sister likes watching the trash can on your desktop become full and empty as she throws files in it, stupid friend drops a soda on your machine, etc.

Also, remember, you can lose hard drive raid controller. And you can't just move the array to another random controller. You normaly have to use the exact same one and still, something could go wrong. Some raid controllers store information on board and other send configuration info to the array attached. It is a gamble when this situation arises.

Same question over at SF: https://serverfault.com/questions/2888/why-is-raid-not-a-backup

Need more reasons?

EDIT: Your idea is correct and could happen to anyone. I personaly have not seen more than one drive fail, but I have seen some die really close together. None of them were in that window of rebuilding, but it is technicaly a risk. But, you have a backup in case something does happen right? haha. Some people learn the hard way on this one sometimes. Raid 6 takes it to the next level with dual parity and can lose up to 2 drives. With any raid setup, the propability of failure rises with the size (# of drives) and complexity of the array. More drives = more points of possible failure


Posted 2009-07-23T17:30:40.483

Reputation: 10 191

sorry, I understand all that, just asking if it's happened to anyone and what the scenario was? – Brian – 2009-07-23T17:53:17.557


You are right, in a RAID-5 scenario if you lose one disk and then rebuild, the system must successfully read every sector of all the surviving drives in the RAID set. NetApp claims that for some situations (they can do RAID sets of up to 28 drives of some kinds) your odds of hitting a second failure can be up to one in ten. Thus they do a "Dual-Parity" which I believe is related to RAID-6.

Obviously the more drives you have in a RAID set, and the bigger they are, the more likely you are to hit a problem. For a small RAID set (3-5 disks) the odds probably have not shifted too far against using RAID-5.

But I always do Raid-DP on NetApps where I can.

David Mackintosh

Posted 2009-07-23T17:30:40.483

Reputation: 3 728

+1 I had never thought about the "must successfully read every sector of all the surviving drives" fact. – AaronLS – 2009-10-20T08:31:37.813


No personal experience, but I have listened to the screams of those who've had it happen to them. Any storage system — be it a single drive, a USB key, tape, a huge RAID installation, or Amazon S3 — will eventually fail in whatever manner is most inconvenient to you. A second failure while rebuilding a RAID 5 set is just one of the ways this can happen.

As an aside, support for triple-parity RAID was integrated into OpenSolaris a couple of days ago -- so at least one vendor thinks that allowing for two additional failures during parity RAID rebuild is worth the engineering effort.

Stephen Veiss

Posted 2009-07-23T17:30:40.483

Reputation: 176


This does actually happen indeed. This is why NetApp storage solutions have an implementation of RAID 6. This is just in case you lose a second drive during the rebuild.

You can calculate the likelihood of a failure using the standard formulas listed on the following page link text As you scale to larger and larger numbers of data drives, the likelihood of just such a failure goes up. If you have enough disks you could push this number into the worry zone if you are using a RAID 5 with a huge number of data volumes.

I can tell you from personal experience that you certainly can have two drive failures in the same array within the same critical timeframe. Raid 6 saved me from having to restore from backup.

Hope this helps


Posted 2009-07-23T17:30:40.483

Reputation: 7 584


Here's a scenario: A drive fails on your RAID5 array, but your spare was already either sitting around, or the order for the new hard drive finally came through. You (or some remote minion perhaps) go with fresh drive in hand to replace faulty one. Due to bad labelling, tiredness or just plain foolishness, one of the remaining good drives is ejected instead of the faulty one... and there's your second failure.


Posted 2009-07-23T17:30:40.483

Reputation: 1 691


I've seen this several times as I am in the data recovery business. And yes they often do fail at the same time, however I don't believe this has anything to do with when they were built necessarily, as I've also seen it happen with mismatched drives. Most often this type of failure occurs shortly after a thunder storm, power surge, or power outage.

Typically the surge damages the drives or RAID controller, and within a few days they start failing. I'm actually working right now on recovering an array that had two drives fail simultaneously after a power outage. (looks hopeless right now)

A little tip: Surge protectors don't really protect your equipment. Always connect your raid 5 to a good UPS. I've never seen this happen when the array was on an UPS.


Posted 2009-07-23T17:30:40.483

Reputation: 11


Accidentally pulling a second good drive out of a single-parity set should not destroy the array with a good RAID implementation. I know that ZFS RAID-Z will just freeze any I/O on the array until you online it again.


Posted 2009-07-23T17:30:40.483

Reputation: 111


Another scenario: A remote minion is ordered to fetch the backup tape out of the tapedrive. She goes to the rack and doesn't pull the tape out of the tapedrive... but 2 (two) HDD's out of the drivebays at the same time and voila: 2 drive failure.

You think this is far fetched? Well I'm at a customer now who did just that and is now looking at a server rebuild.

Good thinng she didn't burn the tape that was actually in the tapedrive or whatnot ;-)


Posted 2009-07-23T17:30:40.483

Reputation: 1