5

I've been experiencing this issue for a couple of months now where my ESX hosts lose connectivity with my iSCSI SAN vmfs volumes.

As a results the ESX hosts enter a nonresponsive mode the associated VMs disconnect and the only remedy is to reboot the host.

This issue happens randomly . I have escalated this issue with VMWare but I haven't had any solution to the issue yet.

I see no errors on my switches and there are no hardware issues as well. My SAN infrastucture is solid and there are 2 paths for every vmfs volume.

Did anybody else experienced a similar issue?

edit: Here are some more details:

The iSCSI SAN software is Datacore Sanmelody 2.0.4.2 running on 2 HP Proliant G5 servers. The storage attached to each of the servers is an HP MSA70 and all the iSCSI SAN Volumes that are presented to my 4 ESX hosts are mirrored.

I have two iSCSI swithces HP Procurve 1800G-24 that are trunked together. My SANLELODY servers are using NC360T NICs. I team two NICs and have one cable connecting to each iSCSi switch. Each ESX server uses two NICs as well for the iSCSI Network.

Basil
  • 8,811
  • 3
  • 37
  • 73
  • 1
    Whats the SAN hardware, NIC models, switch models? Does it happen on all ESX hosts or only some? Only some LUNs or all? Are you using Raw mappings on any of the LUNs/VMs? – Chris Thorpe Mar 03 '10 at 11:43
  • 1
    What version of ESX and update level are you running? – Zypher Apr 11 '10 at 18:24

4 Answers4

2

Let's try a little bit more complicated way. Try to use some other iSCSI solution to check if it's a ESX trouble or, iSCSI itself.

I'll redcomend you StarWind. You can download trial there.

ToreTrygg
  • 352
  • 3
  • 8
  • 25
1

We need to know the ESX version to properly diagnose this scenario.

We hit this problem a while back on ESX 3.5 Update 3, and the resolution was to update/patch the hosts, per this KB article. After upgrading, to Update 4 (and further) the issue has not resurfaced.

If you are already past this patch, can you provide further details as to the versions, and possibly some diagnostic data from one of the ESX hosts? Typically the vmkernel.log is a good place to start.

Mike Fiedler
  • 2,152
  • 1
  • 17
  • 33
0

Maybe, you should disable iscsi pings, as explained here

RainDoctor
  • 4,162
  • 3
  • 22
  • 25
0

I had a very similar issue with ESXi 4, HP Procurve switches and a HP Lefthand SAN. Our issue turned out to be that, while hardware iSCSI initiators worked, they only worked 99% of the time, thus causing random lockups, disconnects, etc. As it turned out Broadcom NICs (with hardware iSCSI) are not compatible with Lefthand SANs. Using software iSCSI initiators cured our problem.

jftuga
  • 5,572
  • 4
  • 39
  • 50