3

I'm currently on test of VMware NSX network environment and met some trouble.

My Environment is:

  • Management Cluster with 3 Hosts and NSX components on 2 dedicated Hosts
  • Compute Cluster with 2 Hosts
  • Single 1Gbps Switch
  • vSphere version 6.0 and NSX version 6.2
  • One dedicated UTP line per all Host for Management and iSCSI(VLAN tagged)
  • One dedicated UTP line per all Host for Transit Network(for VM traffic)
  • One dedicated UTP line per Management Host for External Network

When a VM V on Host H send data to VM W on Host I over NSX network, heavy restransmission is occurred. I tested many cases below:

Cases with Problem:

  1. V send about 20MB to W in single session: retransmission at around 19MB
  2. V send about 50MB to W in single session: retransmission at 19MB only
  3. V send about 2MB to W in 30 concurrent sessions: retransmission at random position.

When this condition, I found some packet order mismatches (maybe cause of retransmission) on packet dump from H's vmnic(uplink), and delayed packets are uniq(not occur previously on dump), but on dump from vDS downlink to VM V or sfw of V, they are occurred twice(original packets and retransmitted packets). So I think, the problem is some lost packets on sender side stack especially between VM V and Host H's Physical NIC.

To divide the data path/stack into two sectors and to check independantly, I tested same cases with another destination VM X on same Host H. then I got clean dump and I found there is no retransmissions problem between VMs on same Host. (so I think, there is no error point on vDS itself and above.)

Next, I tested cases below to check the problem is related on heavy data traffic or heavy filtering and/or encapsulation or not:

  1. same test with Network I/O Control enabled: same problem
  2. same test without Network I/O Control: same problem with some diffs.
  3. same test but slowdown the throuput with N I/O C Limit: same problem
  4. same test with TSO disabled vnic of V(e1000 driver): same problem
  5. same test with vDS MTU 9000: same problem with more Question

Some different things are:

When Network I/O Control is enabled, At first, RTT is increased just before the restransmission and then after retransmission os completed, RTT values are in stable range.

But when Network I/O Control is disabled, RTT after restransmission also incleased again as same as start.

One ore strange thing is although I set MTU to 9000, the size of UTP packets which is embed VxLAN packets are under 1600. so effect of MTU 9000 is not affected.

I'm on trouble. can I get some helps? Thanks.


EDIT ---

If the VMs are on the normal, NSX disabled, vDS, all is fine.


EDIT* Is there any similar issues on OpenvSwitch?

sio4
  • 264
  • 2
  • 10
  • 1
    Since this is a heavy set of features, have you checked with VMware support yet? – ewwhite Oct 29 '15 at 05:42
  • In fact, This test is PoC of the product and I am a customer (Engineer role). I did with VMware's Pre-sales engineer and Partner engineers (both ESXi specialists and NSX specialists) but they cannot found the reason and problem point. :-( – sio4 Oct 29 '15 at 05:51
  • 1
    Given 6.2 is very new you must inherently have a valid support contract with them, so they HAVE to formally investigate this for you - just log a case with them and they'll sort it - they'll want a million log files off you but that's the only way forward - NSX is VERY complex (for a start you've not even mentioned your VTAM setup here) and you've provided only part of the information needed to get to the bottom of this issue. – Chopper3 Oct 29 '15 at 08:45

0 Answers0