1

We just experienced a power outage due to a lightning storm. Outage was long enough to restart servers and switches. We have 2 XenServer hypervisors running in the same pool. At first, both of them seemed to be fine and running normally. Both of them are connected to Equallogic SAN using the same switches.

Then I realized one of them is constantly spamming iscsi errors in /var/log.

messages

Jul 20 23:27:29 hkixen01 kernel: [ 1266.132897]  connection1:0: detected conn error (1020)
Jul 20 23:27:30 hkixen01 iscsid: Kernel reported iSCSI connection 1:0 error (1020) state (3)
Jul 20 23:27:32 hkixen01 kernel: [ 1269.232077]  connection1:0: detected conn error (1020)
Jul 20 23:27:33 hkixen01 iscsid: Login authentication failed with target iqn.2001-05.com.equallogic:0-1cb196-559bd552f-84749b57a93535a3-xen
Jul 20 23:27:34 hkixen01 iscsid: connection1:0 is operational after recovery (1 attempts)

SMlog

Jul 20 20:55:47 hkixen01 SM: [7935] ***** generic exception: vdi_deactivate: EXCEPTION SR.SROSError, The VDI is not available [opterr=LV scan error]

When I start virtual machines on the hypervisor with the errors, cpu usage hits 100% on the performance tab of the virtual machine. Virtual machines are stuck at boot and are unresponsive even after waiting for few hours. At the moment we cannot use the problematic hypervisor to host any virtual machines at all. What could be the problem? There seems to be no problems at all with the other hypervisor.

Itai Ganot
  • 10,424
  • 27
  • 88
  • 143
Wilbis
  • 11
  • 1

2 Answers2

1

After a hard crash (power outage), it's not unusual for this to happen.

You should first determine if the XenServer is connecting to the SR first, then determine the health of the VDI.

Connection/Configuration http://support.citrix.com/article/ctx118841

VDI not Available http://support.citrix.com/article/CTX131201 and /article/CTX138234

JoeSatDell
  • 11
  • 1
0

Problem has been resolved. We had a faulty port in one of our switches and it caused jumbo packets to not get delivered. Smaller packets were delivered normally and that made it difficult to detect the problem.

Wilbis
  • 11
  • 1