0

I have no much knowledge about servers and I was looking all day around the internet about finding a solution to my raid 5 problem. All of a sudden two disks failed. The server won't boot (HP Proliant, windows 2003 R2, very old maybe 10 years old). I know that if one disk is faulty then I can add a new disk and rebuild it and things will be fine, the problem is two went faulty :( is this normal? two at the same time? is there any other thing I can do and I am not aware of? other than taking them out and reinserting them back? Windows won't boot. The Array menu shows that disks 0 and 4 are "Missing". Any other tricks or things to do? It is important because for some unknown reason the back up job did not work for a month and I just found out, so I need to make these raid 5 back online again.

  • 3
    With a 10 year old server, anything is possible and if you didn't notice that your backup wasn't running for a month, I would say it's quite likely the first disk died some time ago and you just didn't notice. Anyway, in all likelihood the data is gone but if it's important I suggest to hire a consultant with more experience (or even a data recovery firm, very expensive but highest chance of success), as you clearly lack the expertise to recover from this (if possible at all). – Sven Jun 30 '13 at 23:10

3 Answers3

1

you might be able to force a disk back into an online state if it's just the raid software that took it offline, that may allow you to rebuild the array.

however if two disks are indeed actually faulty then you're basically hosed. (outside using an expensive data recovery firm)

Sirex
  • 5,447
  • 2
  • 32
  • 54
  • How can I know if it's the software that took them offline? and if so, how do I force the disks to be online again? my controller is HP Array 642 –  Jun 30 '13 at 23:25
  • well, in your logs are the error messages of the type "possible error on disk, now taking offline" or "omg everything is screwed, disk is broken, ARGH !" – Sirex Jun 30 '13 at 23:28
  • Yesterday everything was ok, no messages about any faulty disks.. I am pretty sure –  Jun 30 '13 at 23:29
1

You need to do a procedure called "re-tagging", and it might work. Basically, the idea is as follows:

  • Find which disk failed first
  • recreate the array with all the disks
  • manually force the first disk to fail offline

This should leave you with a degraded but usable raid array, to which you can add a new disk, or rebuild with the old one, if it had a scsi timeout softfail.

Finding which disk failed first is easy - you need to get into the controller logs.

dyasny
  • 18,482
  • 6
  • 48
  • 63
0

As dyasny written: find the drive that failed 1st; remove it (disconnect sata cable); try to rebuild array with other drives (mdadm --assemble /dev/md0 /dev/sd[b-d]1 --force).

I would strongly advise to make a bit-for-bit copy of all drives before you start rebuilding (e.g. with dd). If you picked wrong drive you can try again with other drive removed. Also mark on what port of your controller they were connected. It's not good idea to change this order.

There is quite nice description here https://raid.wiki.kernel.org/index.php/RAID_Recovery Read it before you start.

tatus2
  • 139
  • 4