Proxmox: Packet loss on bonded NICs in LACP mode

0

Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.

I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.

Everything seems to work fine, or so I thought.

Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.

Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).

Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.

bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 9000
    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)
    RX packets 347300  bytes 146689725 (139.8 MiB)
    RX errors 0  dropped 11218  overruns 0  frame 0
    TX packets 338459  bytes 132985798 (126.8 MiB)
    TX errors 0  dropped 2 overruns 0  carrier 0  collisions 0

enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 192.168.1.3  netmask 255.255.255.0  broadcast 192.168.1.255
    inet6 fe80::7285:c2ff:fe67:19b9  prefixlen 64  scopeid 0x20<link>
    ether 70:85:c2:67:19:b9  txqueuelen 1000  (Ethernet)
    RX packets 25416597  bytes 36117733348 (33.6 GiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 16850795  bytes 21472508786 (19.9 GiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)
    RX packets 225363  bytes 113059352 (107.8 MiB)
    RX errors 0  dropped 2805  overruns 0  frame 0
    TX packets 15162  bytes 2367657 (2.2 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)
    RX packets 25499  bytes 6988254 (6.6 MiB)
    RX errors 0  dropped 2805  overruns 0  frame 0
    TX packets 263442  bytes 123302293 (117.5 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)
    RX packets 33208  bytes 11681537 (11.1 MiB)
    RX errors 0  dropped 2804  overruns 0  frame 0
    TX packets 42729  bytes 2258949 (2.1 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST>  mtu 9000
    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)
    RX packets 63230  bytes 14960582 (14.2 MiB)
    RX errors 0  dropped 2804  overruns 0  frame 0
    TX packets 17126  bytes 5056899 (4.8 MiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
    inet 192.168.1.4  netmask 255.255.255.0  broadcast 192.168.1.255
    inet6 fe80::21b:21ff:fec7:40d8  prefixlen 64  scopeid 0x20<link>
    ether 00:1b:21:c7:40:d8  txqueuelen 1000  (Ethernet)
    RX packets 54616  bytes 5852177 (5.5 MiB)
    RX errors 0  dropped 0  overruns 0  frame 0
    TX packets 757  bytes 61270 (59.8 KiB)
    TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):

https://paste.linux.community/view/3b5f2b63

Network config is:

auto lo
iface lo inet loopback

auto enp12s0
iface enp12s0 inet static
    address  192.168.1.3
    netmask  255.255.255.0

iface enp3s0f0 inet manual

iface enp3s0f1 inet manual

iface enp4s0f0 inet manual

iface enp4s0f1 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1
    bond-miimon 100
    bond-mode 802.3ad
    mtu 9000

auto vmbr0
iface vmbr0 inet static
    address  192.168.1.4
    netmask  255.255.255.0
    gateway  192.168.1.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0

Troubleshooting steps I took:

a) sysctl tweaks (as attached) b) MTU increase and enabling jumbo frames on the switch (no change) c) Reset the switch and recreated the LACP trunk (no change)

Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.

Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).

NOP

Posted 2019-02-11T10:18:21.487

Reputation: 343

Answers

1

I have seen this before: HP switches seem to sometimes send broadast packets to all members of an LACP trunk. The Kernel then sees these packets as duplicates and drops those (apart from the first arriving one of course).

While this is of course not elegant, it seems not to create problems in real life. You can verify if it is this effect by sending many broadcast packets on purpose and checking if this lines up with the drop statistics.

Eugen Rieck

Posted 2019-02-11T10:18:21.487

Reputation: 15 128

Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other. – NOP – 2019-02-11T10:41:20.517

My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets. – Eugen Rieck – 2019-02-11T10:59:53.710

I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing. – NOP – 2019-02-13T02:13:53.987

Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right? – NOP – 2019-02-13T02:44:25.980

ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad? – Eugen Rieck – 2019-02-13T08:05:34.263