I'm currently on test of VMware NSX network environment and met some trouble.
My Environment is:
- Management Cluster with 3 Hosts and NSX components on 2 dedicated Hosts
- Compute Cluster with 2 Hosts
- Single 1Gbps Switch
- vSphere version 6.0 and NSX version 6.2
- One dedicated UTP line per all Host for Management and iSCSI(VLAN tagged)
- One dedicated UTP line per all Host for Transit Network(for VM traffic)
- One dedicated UTP line per Management Host for External Network
When a VM V
on Host H
send data to VM W
on Host I
over NSX network, heavy restransmission is occurred. I tested many cases below:
Cases with Problem:
V
send about 20MB toW
in single session: retransmission at around 19MBV
send about 50MB toW
in single session: retransmission at 19MB onlyV
send about 2MB toW
in 30 concurrent sessions: retransmission at random position.
When this condition, I found some packet order mismatches (maybe cause of retransmission) on packet dump from H
's vmnic
(uplink), and delayed packets are uniq(not occur previously on dump), but on dump from vDS
downlink to VM V
or sfw
of V
, they are occurred twice(original packets and retransmitted packets).
So I think, the problem is some lost packets on sender side stack especially between VM V
and Host H
's Physical NIC.
To divide the data path/stack into two sectors and to check independantly, I tested same cases with another destination VM X
on same Host H
. then I got clean dump and I found there is no retransmissions problem between VMs on same Host. (so I think, there is no error point on vDS itself and above.)
Next, I tested cases below to check the problem is related on heavy data traffic or heavy filtering and/or encapsulation or not:
- same test with
Network I/O Control
enabled: same problem - same test without
Network I/O Control
: same problem with some diffs. - same test but slowdown the throuput with
N I/O C Limit
: same problem - same test with
TSO
disabledvnic
ofV
(e1000 driver): same problem - same test with
vDS
MTU
9000: same problem with more Question
Some different things are:
When Network I/O Control
is enabled, At first, RTT
is increased just before the restransmission and then after retransmission os completed, RTT
values are in stable range.
But when Network I/O Control
is disabled, RTT
after restransmission also incleased again as same as start.
One ore strange thing is although I set MTU
to 9000, the size of UTP packets which is embed VxLAN packets are under 1600. so effect of MTU 9000
is not affected.
I'm on trouble. can I get some helps? Thanks.
EDIT ---
If the VMs are on the normal, NSX
disabled, vDS
, all is fine.
EDIT* Is there any similar issues on OpenvSwitch?