22

We have a Dell PowerEdge T410 server running CentOS, with a RAID-5 array containing 5 Seagate Barracuda 3 TB SATA disks. Yesterday the system crashed (I don't know how exactly and I don't have any logs).

Upon booting up into the RAID controller BIOS, I saw that out of the 5 disks, disk 1 was labeled as "missing," and disk 3 was labeled as "degraded." I forced disk 3 back up, and replaced disk 1 with a new hard drive (of the same size). The BIOS detected this and began rebuilding disk 1 - however it got stuck at %1. The spinning progress indicator did not budge all night; totally frozen.

What are my options here? Is there any way to attempt rebuilding, besides using some professional data recovery service? How could two hard drives fail simultaneously like that? Seems overly coincidental. Is it possible that disk 1 failed, and as a result disk 3 "went out of sync?" If so, is there any utility I can use to get it back "in sync?"

peterh
  • 4,914
  • 13
  • 29
  • 44
Mike Furlender
  • 434
  • 1
  • 5
  • 17
  • 21
    Yeah, big sata disks tend to do that. (Rebuilding 3 TB takes many hours while you are exposed to double-failures). So this is expected and it's why RAID-5 using such a configuration is absolutely not recommended. – MichelZ Jul 22 '14 at 14:54
  • Interesting. Could you perhaps direct me to some information about which configuration IS recommended? We require fault tolerance and fast I/O. – Mike Furlender Jul 22 '14 at 14:56
  • 10
    Indeed. In an ideal world drive failure rates are randomly distributed. Practically, this doesn't happen - they are usually bought from the same batch and subjected to the same stresses, which means they all start to hit end of life at the same time. A sudden shift in loading can quite easily tip several 'over the edge', even before you start looking at unrecoverable error rates on SATA disks. Anyway - I'm afraid the bad news is, unless you can get one of those drives online, it's time to get the backups out. – Sobrique Jul 22 '14 at 14:56
  • 6
    http://serverfault.com/questions/339128/what-are-the-different-widely-used-raid-levels-and-when-should-i-consider-them – MichelZ Jul 22 '14 at 14:57
  • 5
    I know it doesn't help much now, but just FYI - the general consensus is to use RAID6 for drives larger than 1TB (atlest when we're talking about 7200rpm). – pauska Jul 22 '14 at 14:58
  • 2
    RAID 5 gives fault tolerance, but it's a compromise option - you have N+1 resilience, but if you have big drives you have a large window where a second fault can occur. RAID-6 gives N+2 fault tolerance, which is generally considered good (triple failure odds are a lot lower). However, you'll also find the failure rate of more expensive disks (e.g. not cheap SATA drives) – Sobrique Jul 22 '14 at 14:59
  • Problem with RAID 6 and cheap drives is the write penalty. You really do end up with horribly bad performance. But there ain't no such thing as a free lunch. – Sobrique Jul 22 '14 at 15:05
  • 1
    I don't see the write penalty- random writes are cached, and sequential ones don't suffer as greatly from it. – Basil Jul 22 '14 at 15:10
  • 1
    If you require fault tolerance you should probably have two independent servers in a mirroring configuration. RAID only mitigates a few specific failure modes. Independent servers mitigate many more risks. – usr Jul 22 '14 at 16:46
  • Everyone's comments on here regarding drive failure are correct. Let me add another reason to consider the precariousness of the situation... power supplies fail all the time, and sometimes when they fail you end up frying things. I actually had a hard drive once that had a strange failure, dead shorting the +5v side of the power supply which caused (somehow) 2 other drives to go up in smoke. – Brad Jul 22 '14 at 22:00
  • 1
    Then again, when power supplies fail spectacularly (e.g. 12V on 5V rail) it doesn't matter whether you have N+1 or N+2 redundancy. Same applies to fire and other big accidents. Offsite backups are essential, and in combination with N+1 redundancy that may be sufficient. – MSalters Jul 23 '14 at 07:30

8 Answers8

39

You have a double disk failure. This means your data is gone, and you will have to restore from a backup. This is why we aren't supposed to use raid 5 on large disks. You want to set up your raid so you always have the ability to withstand two disk failures, especially with large slow disks.

Basil
  • 8,811
  • 3
  • 37
  • 73
  • 3
    There's two problems with RAID5. One: rebuild time of 3TB, given a slow SATA drive can be large, making odds of a compound failure high. The other is the unrecoverable bit error rate - spec sheet on most SATA drives has 1 / 10 ^ 14, which is - approx - 12TB of data. With a 5 way, 3B RAID this becomes almost inevitable when a rebuild is needed. – Sobrique Jul 22 '14 at 15:00
  • 1
    I use RAID5 on my 3TB 5 drive array, I was toying with getting a second array to use as a replicated copy of the first. That way for me to lose the data would require more than 1 disk to fail on both arrays at the same time (so I would need 4 disks) but still keeping that large amount fo the capacity available. Having read this I may now step up that time frame for getting the second array. – War Jul 23 '14 at 11:04
  • 1
    He has probably only a badblock on his disk3. I am really wondering why a professional sysadmin never heard from block-level copy tools. – peterh Jul 23 '14 at 13:31
  • 1
    @Wardy, wouldn't raid 6 give you that? – Basil Jul 23 '14 at 13:57
  • @Wardy That is a lot of additional expense just to have a redundant (ie. mirrored) array. Just switch over to a RAID10, you will get much better write performance, plus you can loose half your disks and stay online (depending on which side of the mirrored raid 0's fail). Less cost, better performance. – SnakeDoc Jul 23 '14 at 20:33
  • Yeh but if say ... the fish tank next to my nas springs a leak I lose everything because all disks would end up with salt water in them. And like what was suggested above, RAID is not a substitute for backups! – War Jul 24 '14 at 13:41
  • @Wardy I think I misread your comment- absolutely, having your data in another set of disks in another chassis is better than local raid. – Basil Jul 24 '14 at 15:30
  • @Basil my solution here might be expensive but from it i effectively get 3 things 1. reliable data access, 2. high performance, 3. a backup of the first 2 with the same features that could be swapped in if need be with a simple dns update. There may be cheaper ways to do this but this is effectively how all the big players do it just with whole data centers. – War Jul 24 '14 at 15:37
  • @SnakeDoc as stated above ... you are wrong Less Cost, same performance but no backup so if anything happens to that box you're screwed without replication. – War Jul 24 '14 at 15:38
  • @basil reading back RAID 6 gives you the ability for 2 disks to fail, in my scenario with a pair of RAID 5 arrays I can have 3 disks fail across the 2 boxes or even all 5 in a single box and still be up and running :) and I don't have that expensive performance hit of RAID 6 with RAID 5 ... RAID 10 is a good option too I spose – War Jul 24 '14 at 15:41
  • @Wardy RAID5 or RAID6 does not give the same performance as a RAID10, not even close. RAID5 and 6 get a read boost, but writes takes considerable longer. RAID10 has improved writes and reads. Depends on your expected i/o load. If you are running something write-intensive, such as a VM cluster, then you will benefit immensely from RAID10 in a 0 + 1 configuration. However, since the OP is considering a separate chassis, it's really a backup server with complete data set, so this point is moot. – SnakeDoc Jul 24 '14 at 15:53
  • Raid levels all have their use. Raid 10 throws half your capacity out the window, but doesn't make anything calculate parity. A good controller will calculate parity well enough that it isn't that slow, and will have a cache that hides the write penalty even more. – Basil Jul 24 '14 at 21:45
  • @Wardy Just out of curiosity... whose idea was it to store your NAS next to a fish tank? Sounds like a more basic design flaw than which RAID you choose :p – thanby Jul 25 '14 at 11:52
  • @thanby lol its not "right next to the tank" just close enough that a bit of pressure (caused by say a leak low down) might be able to reach it and i'm planning to move the tank just don't have the space atm to put it anywhere else as the replacement tank is in that space and its pretty large. – War Jul 25 '14 at 19:11
  • 3
    Not a very helpful answer. Sure, with a double disk failure on a RAID 5, chance of recovery is not good. But most double disk failures on RAID 5 are probably just a matter of one faulty disk and a few uncorrected read errors on other disks. If that's the case, recovering most of the data is still possible given the right tools. Pointers to such tools would be helpful. – kasperd Aug 11 '14 at 12:19
37

Your options are:

  1. Restoring from backups.
    • You do have backups, don't you? RAID is not a backup.

  2. Professional data recovery
    • It's possible, though very expensive and not guaranteed, that a professional recovery service will be able to recover your data.

  3. Accepting your data loss and learning from the experience.
    • As noted in the comments, large SATA disks are not recommended for a RAID 5 configuration because of the chance of a double failure during rebuild causing the array to fail.
      • If it must be parity RAID, RAID 6 is better, and next time use a hot spare as well.
      • SAS disks are better for a variety of reasons, including more reliability, resilience, and lower rates of unrecoverable bit errors that can cause UREs (unrecoverable read errors)
    • As noted above, RAID is not a backup. If the data matters, make sure it's backed up, and that your backups are restore-tested.
HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
  • 1
    If you have 5 disks (as per the OP), and are committed to a hot spare, surely you would take RAID10 over RAID6...? – jimbobmcgee Jul 23 '14 at 10:41
  • 1
    Well, for starters - you'd be using 4 spindles in a RAID 1+0 to get 2 disks worth of space, leaving one disk 'spare'. You can tolerate two failures (the right two at least). RAID6 would give you 3 disks worth of space, and can tolerate two failures as well (any two). RAID1+0 does have a better performance capability, with a lower write penalty, and potentially better random read performance (reads could be serviced from either of two spindles). – Sobrique Jul 24 '14 at 09:30
  • For point 2. Data Recovery. Recovering Data from a RAID5 professionally can run you $20k easy. Moreover, OP let the rebuild run overnight, stressing the disk, which can cause recovery to be more difficult or even impossible. Just letting you know ahead of time. Be sure to send all disks. – OmnipotentEntity Jul 24 '14 at 12:52
27

After you accepted a bad answer, I am really sorry for my heretic opinion (which saved such arrays multiple times already).

Your second failed disk has probably a minor problem, maybe a block failure. This is the cause, why the bad sync tool of your bad raid5 firmware crashed on it.

You could easily make a sector-level copy with a lowlevel disk cloning tool (for example, gddrescue is probably very useful), and use this disk as your new disk3. In this case, your array survived with a minor data corruption.

I am sorry, probably it is too late, because the essence of the orthodox answer in this case: "multiple failure in a raid5, here is the apocalypse!"

If you want very good, redundant raid, use software raid in linux. For example, its raid superblock data layout is public and documented... I am really sorry, for my this another heretic opinion.

peterh
  • 4,914
  • 13
  • 29
  • 44
  • `..You could easily make a sector-level copy of a block copy tool..`, tell me more about this *sorcery*? – MDMoore313 Jul 23 '14 at 14:02
  • @BigHomie There is a legion for that. Google for "sector level cloning of disks with bad block", or such. I personally developed a simple C tool for that, but it could be developed in bash as well. Sorcery? No, it is not sorcery, I am simply thinking a professional sysadmin shouldn't stop on a single bad block... – peterh Jul 23 '14 at 14:05
  • Whatever caused it to freeze his rebuild would still be in there, potentially. Sometimes bad sectors can't be recovered with a block copy. That said, to get this server back online, the professional solution is to recover from a recent backup and not spend time trying to do low-level data recovery which may not even work. – Basil Jul 23 '14 at 14:08
  • @Basil Maybe you don't understood enough well what is a block or sector-level copy. You suggested him a total data loss, although his system was able to survive with a single block error. – peterh Jul 23 '14 at 14:11
  • Exactly, copying a *sector* block by block is fine, copying *blocks* at the sector level will likely server no usefulness. – MDMoore313 Jul 23 '14 at 14:12
  • @PeterHorvath I think we're saying the same thing, I just misunderstood your translation. – MDMoore313 Jul 23 '14 at 14:25
  • @BigHomie Ok, maybe. No prob, I hope. – peterh Jul 23 '14 at 14:26
  • 8
    Shame this got down votes, it actually tries to help the OP fix the mess unlike some of the others. +1 – Vality Jul 23 '14 at 14:54
  • Wouldn't a hardware RAID controller perform better than a software one? – Mike Furlender Jul 23 '14 at 16:02
  • 3
    @Vality it doesn't try to solve the mess, it extends his problems. A raid5 with corrupted blocks burnt in gives no end of pain as it will pass integrity checks but regularly degrade. Also he would have no idea which data is corrupt. If it was as easy as fixing a block that would be the standard solution. – JamesRyan Jul 23 '14 at 16:25
  • 4
    @JamesRyan I agree that it will cause some later problems and I even agree that there are underlying issues here. However it does offer a valid solution on how to get some functionality back and as the OP was talking about data recovery experts I can only assume they do not have backups to get their data back otherwise. In the end, this solution would only be part one of a fix, once this method had got the system booted again, you would probably want to transfer the filesystem to 5 new disks and then importantly back it up. – Vality Jul 23 '14 at 16:57
  • 1
    "You could easily make a sector-level copy of a block copy tool" Is this *really* what you meant to write? – Arnaud Meuret Jul 23 '14 at 17:43
  • 1
    @MikeFurlender I think hardware is faster, but proprietary and therefore brittle as you need to get the exact same controller in case it fails. Software RAID is independent of the hardware. See btrfs and zfs. – Martin Ueding Jul 24 '14 at 14:02
  • potentially, assuming all the disks still spin ok this is possible, and worst case the loss could be 0 ... its tough but can be done. takes hours though ... more likely days on big drives. – War Jul 24 '14 at 15:45
  • @queueoverflow So true - 1990. Today not. Same manufacturer compatible line is more like it. – TomTom Jul 24 '14 at 18:06
  • 1
    *Shame this got down votes* - I agree that it doesn't deserve downvotes because it's a sincere attempt to help, but it's a hail-mary last gasp desperate attempt to salvage something from the trainwreck, not a (with all respect to Peter) 'proper' solution to RAID 5 failures. – Rob Moir Jul 24 '14 at 19:39
4

Simultaneous failure is possible, even probable, for the reasons others have given. The other possibility is that one of the disks had failed some time earlier, and you weren't actively checking it.

Make sure your monitoring would pick up a RAID volume running in degraded mode promptly. Maybe you didn't get an option but it's never good to have to learn these things from the BIOS.

richardb
  • 1,206
  • 9
  • 14
  • 3
    +1 for mentioning neglected monitoring. It is important to notice already the step "normal" -> "critical", not the step "critical" -> "failded". This applies likewise to all other types of redundancies (backup internet line, beer in the basement, spare tyre, ...). – Hagen von Eitzen Jul 23 '14 at 11:57
2

To answer "How could two hard drives fail simultaneously like that?" precisely, I'd like to quote from this article:

The crux of the argument is this. As disk drives have become larger and larger (approximately doubling in two years), the URE (unrecoverable read error) has not improved at the same rate. URE measures the frequency of occurrence of an Unrecoverable Read Error and is typically measured in errors per bits read. For example an URE rate of 1E-14 (10 ^ -14) implies that statistically, an unrecoverable read error would occur once in every 1E14 bits read (1E14 bits = 1.25E13 bytes or approximately 12TB).

...

The argument is that as disk capacities grow, and URE rate does not improve at the same rate, the possibility of a RAID5 rebuild failure increases over time. Statistically he shows that in 2009, disk capacities would have grown enough to make it meaningless to use RAID5 for any meaningful array.

So, RAID5 was unsafe in 2009. RAID6 will be soon too. As for RAID1, I started making them out of 3 disks. RAID10 with 4 disks is also precarious.

Halfgaar
  • 7,921
  • 5
  • 42
  • 81
  • 3
    Again, RAID is not a backup alternative it's purely about adding "a buffer zone" during which a disk can be replaced in order to keep available data ... available. The other option is to use replication which would require 2 arrays to fail at the same time ... much less likely I would think. – War Jul 24 '14 at 13:46
  • Personally, I don't like the mantra that RAID is not a backup. The dictionary says: "a person, plan, device, etc., kept in reserve to serve as a substitute, if needed." If the amount of redundancy is not enough, it will fail to serve as a substitute. If you don't care about the redundancy RAID provides, you might as well not use it. As for it not being a replacement for off-disk and off-site backups, that's a whole other matter, with which I agree (of course). – Halfgaar Jul 24 '14 at 14:12
  • so what is your thought on those using RAID stripes with no redundancy? in this case the RAID array is being used purely to gain a performance benefit which is a perfectly valid use IMO to my mind RAID serves 2 purposes 1. to provide speed by grouping the drives or 2. to provide a safety net in the event that n drives fail ensuring the data is still available. – War Jul 24 '14 at 14:36
  • Anyone implementing RAID would choose the RAID type they want to use based on their needs, speed, reliability or a combination of the 2 but that still doesn't make RAID any form of backup solution. – War Jul 24 '14 at 14:37
  • @Halfgaar How does RAID help you roll back from corrupted files or accidental deletion? RAID is not a back up. – gparent Jul 24 '14 at 19:40
  • @gparent: Of course I agree with that. But you should rephrase that into: RAID(1/5/10) is an *availability* backup. If you choose a redundancy RAID level, might as well configure it so that it will actually survive a disk crash, otherwise it's pointless. – Halfgaar Jul 25 '14 at 06:59
  • 1
    When people say RAID is not a back up, they're not talking about availability. I think you're just playing with words. :) – gparent Jul 25 '14 at 14:13
  • RAID minimizes downtime from certain specific hardware failures, hopefully the ones that are the most likely. It is, however, not a backup. – David Schwartz May 31 '16 at 08:50
2

Thread is old but if you are reading , understand when a drive fails in a raid array, check the age of the drives. If you have several disks in a raid array and they are over 4-5 years old, the chances are good that another drive will fail. *** MAKE An IMAGE or Backup ** before you proceed. If you think you have a backup, test it to make sure you can read it and restore from it.

Reason being is that you are placing years of normal wear and tear on the remaining drives as they spin full speed for hours and hours. The larger the number of 6 year old drives, the larger chance another drive will fail from the stress. If it's RAID5, and you blow the array, great you have a backup but a 2TB disk will take 8 - 36 hours to restore depending on the type of raid controller and other hardware.

We routinely replace the entire raid hive on production servers if all the drives are old. Why wast time replacing one drive, then wait until the next one fails in a day, week, month or two. As cheep as drives are, its just not worth the down time.

1

Typically when purchasing drives in a lot from a reputable reseller you can request that the drives come from different batches, which is important for reasons stated above. Next, this is precisely why RAID 1+0 exists. If you had used 6 drives in RAID 1+0 you would have had 9TB of data with immediate redundancy where no rebuilding of a volume is necessary.

Payton Byrd
  • 141
  • 1
  • 5
  • Where is the evidence showing that the part about using drives from different batches is anything but an urban myth? Also, RAID 1 does not magically protect against running into unreadable sectors during rebuilding. If you want protection against that you either go with RAID 6 or with RAID 1 with 3 mirrors (a tad expensive). – kasperd Mar 26 '15 at 10:39
  • 1
    @kasperd I think the question that forms the first part of your comment is similar to, though obviously not exactly the same as, [Should I 'run in' one disk of a new RAID 1 pair to decrease the chance of a similar failure time?](http://serverfault.com/q/676121/58408). – user Mar 26 '15 at 10:58
1

If your controller is recognized by dmraid (for instance here) on linux, you may be able to use ddrescue to recover the failed disk to a new one, and use dmraid to build the array, instead of your hardware controller.

Brian Minton
  • 256
  • 7
  • 15