Linux optical 10Gbe networking, how to diagnose performance problems?

Question

I have a small cluster consisting of 3 servers. Each has two 10Gbe SFP+ optical network cards. There are two separate 10Gbe switches. On all servers one NIC is connected to switch 1, second NIC is connected to switch 2 to provide fault tolerance.

Physical interfaces are bonded on server level using LACP.

All servers can ping each other, but on one there is small (4%) packet loss (over bonded interface, which looks suspicious to me)

When I check with iperf3 transfer rates between two good servers, they show about 9.8Gbit/s transfer rates in both directions.

Those two good servers can also download from problematic one also about 9.8 Gbit/s

Iperf3 show strange thing when run as client on problematic server. It starts with a few hundred megabit in first turn. Later speed drops to 0 bit/s (while still running ICMP ping with ~96% success rate). Only in one direction. When other servers download from this, they get full speed.

It's all running on a same hardware even firmware version is the same (Dell R620 servers, Mellanox ConnextX-3-EN NIC's, Opton SPF+ modules, Mikrotik CRS309-1G-8S switches). Also OS is the same latest stable Debian with all updates and exact installed packages.

There is no firewall, all iptables rules are cleared on all servers

On problematic server i check interfaces, both NIC's show UP and running at 10Gbit full duplex

Also cat /proc/net/bonding/bond0 show both interfaces UP, active, no physical link errors

I checked/replaced SFP+ modules, used different fiber patch cords, tried different switch ports and nothing changes, still this one problematic server get poor download speed from others and small packet loss (over bonded interface!).

I also tried patch cord combinations with: (both on, first on second off, first off second on). Also no change

Any ideas how can I diagnose it better?

How are your switches connected and co-configured? The right way to do LACP is to have a shared MAC/IP table between switches - if you don't have this then you're likely to get 'MAC-flapping' between them which can lead to slower performance than expected and lost packets. — Chopper3, Feb 28 '19 at 12:43
Ah - just looked at the specs for your switches, they can't be configured in a way that allows for dual-switch-LACP sorry. Go back to active/standby config on your NICs and you'll be fine ok. — Chopper3, Feb 28 '19 at 12:49
@Chopper3 thanks for advice, I will check if other bonding mode will help and post updates here — Mateusz Bartczak, Feb 28 '19 at 14:22

score 1 · Answer 1 · answered Mar 03 '19 at 19:41

Unless the switches support stacking and support LACP across chassis, LACP cannot work that way. In fact, static LAG trunking won't work either.

Generally, link aggregation only works with a single opposite switch (or a stack acting like it).

With simple L2 redundancy, you can only run the NICs in active/passive pairs with failover. Using multiple L3 links with appropriate load balancing and IP migration on failover or monitoring by an external load balancer will also work in your scenario.

Dmitriy Kupch · Answer 2 · 2019-02-28T14:46:09.190

Please see my answer here (don't forget to hit thumbs up if it will be useful in your situation):

Why am I only achieving 2.5Gbps over a 10Gbe direct connection between 2 machines?

It is most probably related to LRO GRO with stands for RECEIVE OFFLOAD, that can be easily disabled. There is also a nice explanation on why this happens. Here: https://lwn.net/Articles/358910/

Tuning 10G network interfaces is a huge topic.

Linux optical 10Gbe networking, how to diagnose performance problems?

2 Answers2