1

I'm running a HP server with XenServer 5.6.

I've bonded 2 of my 4 NICs together (NIC0 and NIC1).

Now, I'm noticing at random big chunks of packet loss (usually 10-15 dropped packets, but sometimes there are no ping replies until I pull out both cables).

Neither of the NICs seem broken, because if I only connect one of the two cables it works fine. No loss at all.

64 bytes from 192.168.110.20: icmp_seq=9191 ttl=64 time=7.685 ms
64 bytes from 192.168.110.20: icmp_seq=9192 ttl=64 time=6.681 ms
64 bytes from 192.168.110.20: icmp_seq=9193 ttl=64 time=1.053 ms
Request timeout for icmp_seq 9194
Request timeout for icmp_seq 9195
Request timeout for icmp_seq 9196
Request timeout for icmp_seq 9197
Request timeout for icmp_seq 9198
Request timeout for icmp_seq 9199
Request timeout for icmp_seq 9200
Request timeout for icmp_seq 9201
Request timeout for icmp_seq 9202
Request timeout for icmp_seq 9203
64 bytes from 192.168.110.20: icmp_seq=9204 ttl=64 time=14.665 ms
64 bytes from 192.168.110.20: icmp_seq=9205 ttl=64 time=1.275 ms
64 bytes from 192.168.110.20: icmp_seq=9206 ttl=64 time=3.090 ms

and not long after....

Request timeout for icmp_seq 9252
Request timeout for icmp_seq 9253
Request timeout for icmp_seq 9254
Request timeout for icmp_seq 9255
Request timeout for icmp_seq 9256
Request timeout for icmp_seq 9257
Request timeout for icmp_seq 9258
Request timeout for icmp_seq 9259
Request timeout for icmp_seq 9260
Request timeout for icmp_seq 9261
Request timeout for icmp_seq 9262

x 50

And now it's not even coming up again. I only get loss.

I did not pull out any cable. I did not touch the machine...

NIC lights keep flashing and Xen reports both NICs (and BOND0+1) as connected.

Unplugging either of the two cables (or both) doesn't seem to solve my problem either. It keeps giving a lot of loss, until, all of a sudden, it replies on pings again.

Any clue what's happening?

Odd thing is it can run fine for 15-30 minutes, then all of a sudden I get these huge packet loss 'phases'.

In testing phase the two NICs are connected to the same switch by the way.

And yes, other services go down too, not only ICMP.

Kind regards, Yeri

Tuinslak
  • 1,435
  • 7
  • 30
  • 54
  • Does the server model support XenServer 5.6 and if so have you applied their PSP/updates? – Chopper3 Mar 29 '11 at 22:56
  • It's a HP ProLiant DL360 G7. Why would it not support XenServer? – Tuinslak Mar 29 '11 at 23:16
  • Updating Bios now; but doubt it will change much. – Tuinslak Mar 29 '11 at 23:25
  • didn't solve the problem. Timeouting again. – Tuinslak Mar 30 '11 at 08:33
  • Because not all HP servers support Xenserver and you didn't initially mention the model, that said the 360G7 does support 5.x but there are very few drivers available which is disappointing. It's a new machine, I'd log a call with HP myself, sorry. If it's any consolation there are quite a few 'gotcha's with HP Teaming on Windows. – Chopper3 Mar 30 '11 at 09:42

1 Answers1

1

Seems like it was the Cisco switch that was causing the issues (perhaps some MAC address security that was turned on).

Using two HP ProCurve switches now (moved from the office to the datacenter) and it seems to be working fine.

Tuinslak
  • 1,435
  • 7
  • 30
  • 54
  • Or, perhaps, the Cisco switch was not configured correctly :) Please mark your response as the answer. – pauska Mar 30 '11 at 11:38
  • That's possible too. :) -- I will accept it; I have to wait 24 hours before being able to mark it as accepted answer. – Tuinslak Mar 30 '11 at 13:23
  • Is "seems like" really considered an answer? while it may be "solved" there is nothing to say that a non-Cisco switch would solve the issue. – Ross Dec 28 '12 at 05:13
  • I am not sure what caused this. But when the server was being configured at the office, it was behaving oddly. Once it was in the datacenter, this problem was gone. – Tuinslak Dec 28 '12 at 09:24