I have a VmWare solution running on a HP bladesystem with a Lefthand ISCSI san. There are currently two VmWare hosts in that environment.
I have two Debian VM's sharing an ISCSI disk (with ocfs2), mounted directly from the san using open-iscsi. It all worked perfectly, but yesterday one client crashed as soon as it tempted to write something on the shared ocfs2 partition.
I tried setting some ISCSI parameters to more conservative values, to no avail. Only (v-)moving the client to the other VM host resolved the problem. Today, moving the other client to the problematic host provokes the same errors:
connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294971299, last ping 4294966612, now 4294973799
connection1:0: detected conn error (1011)
iscsid: Kernel reported iSCSI connection 1:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
kernel: [ 328.558970] connection1:0: detected conn error (1020)
iscsid: connection1:0 is operational after recovery (1 attempts)
[repeat until hard reset]
It seems to be related to that VM host, wich has the exact same configuration as the other one. Being blades, they use the same networking hardware, a flex-10 interconnect.
Does someone has any idea what this could be related to ? I'd like to find the cause, as both VM hosts could en up having the same problem (I'll have to switch to networked disks then, seems more stable, less prone to hard resets).