0

For some reason the Array Controller on my Proliant DL360 G6 couldn't recognize 2 of the 6 750GB drives I am running in a RAID 5 configuration with VMware ESXi 5.1.

When I rebooted the server I chose the BIOS option (F2) to recognize the 2 drives it said it stopped acknowledging.

Here is the BIOS option I chose: "Select "F2" to accept data loss and re-enable logical drive(s)."

All 6 drives now show up in the volume again.

Unfortunately, there seems to be data corruption. Many of the virtual machines no longer work and don't even register properly in ESXi anymore. ESXi boots ok, but none of the virtual machines hosted in it work.

I booted with the Array Configuration Utility and it says the Parity Initialization has finished. ACU doesn't show any other errors or information notices.

Is there a way for me to rebuild or recover my data so my virtual machines start working again? It is still a mystery to me why Array stopped seeing the two drives in the first place, but all I want to do now is recover all my data so my virtual machines start working again.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
user127875
  • 9
  • 1
  • 2

2 Answers2

7

F2 is typically the right option to choose... Otherwise the ESXi server would not have booted. My concern is what happened leading up to this incident...

Did you receive any errors? Any indicators on the hard drive LEDs? Typically, an HP server's disks won't just crap-out on you. Considering you're using 750GB disks in RAID5, the chances are that the drives are SATA and you may have more than one failed or failing disk.

Let's go the the HP ProLiant DL360 G6 quickspecs...

Okay, so the only disk options for that server from HP are:

  • SAS 2.5" in 72GB, 146GB, 300GB, 450GB, 600GB...
  • SATA 2.5" in 120GB, 160GB, 250GB, 500GB, 1TB...

So, where did these disks come from?
They're definitely not HP disks. I don't recall any server-class 2.5" 750GB disks ever hitting the market.

Are these laptop hard drives?

If so, there are a number of reasons this could have happened. I think a big SATA RAID5 could have resulted in the dreaded unrecoverable-read error (URE), where you may have had a failed disk and another one on it's way out.

Since this is ESXi, let's hope you have the HP health agents and utility bundle installed.

If you do, post the screenshot of the Hardware Status -> Storage menu in VMware and possibly the output of the /opt/hp/hpacucli/bin/hpacucli ctrl all show config detail command.

Worst case, your data is hosed.

Best case, you can build new virtual machines and import the VMDK files. Maybe it's just .vmx file corruption.

Either way, you should not move forward until you determine what happened with your disks in the first place. Otherwise, you're building on a pile of s**t and could encounter the same thing in the future.

(also, update your server's firmware, if you haven't already)

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thanks for the great response! Yes, I am using laptop hard drives. I hadn't installed the HP health agents/utility bundle, but I will do so this time. I am convinced I just have to re-install ESXi and restore my backups. My investigations into RAID 5 show that if you lose more than 1 drive at a time (no matter how many drives in the RAID) you won't be able to automatically recover. I do have backups, so all is not lost. It will just take me a day of working in the noisy datacenter to recover. :( The firmware is completely up to date. – user127875 Jun 27 '13 at 14:59
  • 2
    NO LAPTOP HARD DRIVES IN THE SERVER!! Don't reinstall onto the same disks!! – ewwhite Jun 27 '13 at 15:42
  • Your comment "RAID 5 show that if you lose more than 1 drive at a time (no matter how many drives in the RAID) you won't be able to automatically recover." isn't necessarily true, depending on what failed. Sometimes the system will mark multiple drives as bad but you might be able to mark them as good/online and see if the system boots. At that point it's definitely time to take a backup and then investigate (via software/TAC) why the system thought the drive(s) were bad. I've had this happen on multiple servers simply due to a bug in the RAID firmware. – TheCleaner Jun 27 '13 at 15:50
3

If it has rebuilt the array and you still cant access the data (''I booted with the Array Configuration Utility and it says the Parity Initialization has finished. '')

It is time to seek professional help. If it has rebuilt only using 5 drives then it may take some serious detangling to get anything back.

Depending where you are look for a reputable data recovery company.

Mike

Mike
  • 31
  • 1
  • 2
    And a lot of money. Possibly / preferably a job outside IT - one does not require reading (i.e. ignoring "accept data loss" and then wondering why the data is gone). – TomTom Jun 27 '13 at 11:05