3

I've been trying to search over ther internet for a solution to fix this. I'm the new IT person for my organization and our previous IT have not kept any records on certain things. I do understand, that its a bad practice but I'm now making all these on a documentation for any future reference.

Having said that; recenty I came across an issue on our server. We're using an HP Proliant DL180 Gen6 server with ESXi 5.0 ... The issue is; that I'm unable to power up certain VM's as it gave me I/O error. Below seen was the error;

Reason: 0 (Input/output error). Cannot open the disk '/vmfs/volumes/4e7a4edb-08851e40-0c1e-1cc1de700f23/EON-GATEWAY (192.168.0.1 )/EON-GATEWAY ( 192.168.0.1 )-000001.vmdk' or one of the snapshot disks it depends on.

So to speak, I powered down all the VM's and restarted the host to jump into BIOS for an observation on the RAID. I do not know what type of RAID that the server is on as it shows something like;

Error on SLOT1 : bay 11 -- (as I remember)

Is there a way for me to check what exactly the issue is.. Because, I can see that the effected hard disk still flashes green color LED. Out of 12 bay.. bay 1 shows an orange color LED & bay 4 shows nothing at all.

I'm pretty much confused how to get this sorted. If anyone can guide me what exactly I need to do to get this sorted or may be a hint on how to check the RAID / array info.??

Update

Below seen images are from smart array controller...

enter image description here

enter image description here

enter image description here

Here's a video link to the server HDD's. I'm still curious as now the bay 1 flashes blue & amber while others bays are in blue (on the smart array screen as seen above)..

AzkerM
  • 259
  • 4
  • 18
  • When you select a disk in the array configuration utility, the blue UID indicator comes on so you can identify the disk. That's why you're seeing that. – ewwhite Mar 27 '14 at 15:34
  • Okay! so it showed blue UID indicator on all except for bay1 which keeps on indicating amber/blue UID continuously. Also, I've updated my answer once again with the error that I've got on screen... Could you advise me whether replacing the disk will loose any DATA(s) on a RAID 6 or usually will it recover as the status says recovering.\ – AzkerM Mar 27 '14 at 15:43
  • Hi @ewwhite .. I have a small clarification.. if you see [**this video**](http://youtu.be/gxtxrlfhJiQ), I understand that slot 1 is failed/missing.. slot 4 (top one on the second column), shows no UID indicators bur RAID array says its ok... slot 11 (the very last column, middle one), shows a green UID indicator and it flashes in slow sequence but not like other functioning HDD's.. although RAID array says HDD is okay.. can you explain slot 4 & slot 11 as to why it happens?? anything to look into.. \:) – AzkerM Apr 01 '14 at 07:30
  • 3
    No. Replace the failed disk. – ewwhite Apr 01 '14 at 08:28
  • @ewwhite Thank you. Also, HP professionals says to update my firmware version as it is too old. Will it be okay to update & how can we update it since I'm running it on an ESXi – AzkerM Apr 01 '14 at 10:40
  • 1
    Download and run the [HP Service Pack for ProLiant DVD .ISO](http://www8.hp.com/us/en/products/server-software/product-detail.html?oid=5104018#!tab=features). – ewwhite Apr 01 '14 at 11:01
  • @ewwhite Hi There! I've got a new replacement disk for the server. Can I just plug in while the server is running or should I visit the Smart Array P410 configuration utility to check whether the controller is accepting the new drive??? – AzkerM Apr 07 '14 at 07:06
  • 1
    Just plug the disk in. You should really become familiar with how this equipment works. – ewwhite Apr 07 '14 at 07:43

4 Answers4

6

This could be a VMware issue or a locking problem on the virtual disk. Can you capture the full error message? Do other virtual machines power on without problems?

Despite that, it appears you have a physical storage issue, too.

Here's what the HP Smart Array P410 configuration output on a DL180 G6 looks like:

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 2 TB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 2 TB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 2 TB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 2 TB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 2 TB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 2 TB, OK)

Are you sure that you're not mistaking the drive designation of 1I:1:1, which means (port 1I:box 1:bay 1) for "SLOT1 : Bay 11"? That would explain the amber/orange light in the first drive bay.

Given that this server was not documented well, there's a high probability that it was also configured with RAID5 (mean? probably).

  • Does the server boot?
  • What error messages do you see at POST?
  • Do you have to press any keys on the keyboard to allow the system to boot? (e.g. F1)
  • What capacity and type of disks are installed in the server?

If the server is on, you can view the RAID configuration from within ESXi. Do this by navigating to: Hardware Status > Sensors > Storage.

If your ESXi was installed using an HP-specific VMware image, you will see the RAID configuration there.

enter image description here

If you don't see anything inside of VMware, you will need to reboot and view the RAID configuration at the BIOS level.

When the system is powered on, you want to hit the F8 key when prompted to enter the Smart Array P410 configuration utility.

Once inside, select "View Logical Drives".

enter image description here

This will show you the RAID health status and you can hit Enter for details. This will tell you conclusively which disks are good/bad/missing in the array.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Wow! your explanation pretty much makes sense to me.. Server does boot & the VM(s) also work perfectly except for few.. I'd say currently its just the `gateway` VM which I assume, stored on bay 11... Bay 1 shows an amber light, bay 4 shows nothing, bay11 shows green light in a slow flow but not with the rest of the bays.. Other bays flash green light in a same kind of sequence.. and I do not see storage under VM hardware status.. :( – AzkerM Mar 27 '14 at 13:59
  • @AzkerM Then you need to look at the BIOS configuration to see the RAID setup. – ewwhite Mar 27 '14 at 14:04
  • That's is what I'm planning to do now. All my worry is that it flashes the screen & go quickly... I cannot read the signs or the errors it shows properly.. Will check and post it back here. – AzkerM Mar 27 '14 at 14:06
  • Do I have to configure **RAID** monitoring under ESXi similar to [**this**](http://kb.stonegroup.co.uk/index.php?View=entry&EntryID=195) or is it an inbuilt feature? – AzkerM Mar 27 '14 at 14:15
  • @AzkerM Check your RAID BIOS configuration. – ewwhite Mar 27 '14 at 14:20
  • I've updated my question with screens of the array... your answer really helped me to go upto this level. please advise what kind of steps should I take, or replacing a drive will loose any DATA(s).. etc. – AzkerM Mar 27 '14 at 15:26
  • @AzkerM The disk in slot #1 is dead. You may want to remove it and put it back in the slot to see if it can rebuild. Otherwise, it's time to obtain a replacement disk. If you have questions about RAID6, please see [**this question**](http://serverfault.com/a/339214/13325). – ewwhite Mar 27 '14 at 16:33
  • I shutdown the server removed it & replaced it back.. But still the amber UID on bay 1 is flashing. Do I have to remove it while the server is on or just what I did was fine?? Further, if disk is the problem, will replacing a disk solve the problem without me having loose any DATA(s). – AzkerM Mar 28 '14 at 14:19
  • 2
    @AzkerM Yes. Replace the disk. – ewwhite Mar 29 '14 at 21:21
1

I may be wrong but I think you have two problems here.

Yes you appear to have a physical disk issue, if you can avoid the downtime then boot up off the HP SPP/ACU image and go into ACU, run the diagnostics and replace parts as needed.

The first error however suggests the datastore is IP-based, i.e. NFS or iSCSI, rather than a local SAS/SATA disk such as the ones you're having real problems with. Have you got other IP based datastores? If so I'd look at where they're based and see if something's been switched off or deleted.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • It's worse than that. The VMs are on local storage, but the name in vCenter is literally `EON-GATEWAY (192.168.0.1 )`. – ewwhite Mar 27 '14 at 13:19
1

If you're lucky, your predecessor might have installed a proper HP image of the esxi server on the box, in which case you should be able to access the HP System Management Homepage remotely:

https:// ipofyourserver :2381

This should be able to tell you a little bit more about the general health of the server (which also includes the arrays).

If not, you should reboot the server and hit F8 after the P410i controller is done initializing. That will get you into the ORCA (Option Rom Configuration of Arrays). Select "Show logical drives". This should give you a list of the local logical drives, and will also say whether the array(s) are healthy or not. Note that you might have to "press any key" to actually see the P410i initialization messages, after the HP logo has appeared.

One last thing: I've seen on several occasions that something goes haywire in the internal workings of the storage box in the server, which will either render the LEDs in the drives mute (off), or scramble them so a healthy drive can be blinking amber instead of green. Just a fair warning, not to take the drive activity LEDs too serious :)

N-3
  • 118
  • 2
  • 11
0

You might want to consider running a firmware upgrade dvd on the server as well. That controller firmware is ancient! I would urge you to download the firmware dvd, once you get the raid-5 array back in shape. There's considerable fixes and improvements in the later versions of the controller firmware.

Download it from here:

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?sp4ts.oid=3884083&spf_p.tpst=swdMain&spf_p.prp_swdMain=wsrp-navigationalState%3Didx%253D%257CswItem%253DMTX_9ed665a89aba447d925937f38b%257CswEnvOID%253D4115%257CitemLocale%253D%257CswLang%253D%257Cmode%253D%257Caction%253DdriverDocument&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken

You either put it on a usb drive and boot from it, burn it to a cd, or simply mount the iso file via ilo. Note that running the dvd will also upgrade the bios, the nic- and the ilo firmware.

N-3
  • 118
  • 2
  • 11