0
Got a bit of a strange problem. I have a machine running Proxmox 5.3 that has among its hardware a 4 port Intel NIC card (Gigabit, PCI-e) in addition to a fifth Gigabit ethernet on the motherboard.
I have the machine configured that the onboard NIC is the management interface for the machine, and the 4 gigabit NICs are bonded together with LACP (and connected to a HP ProCurve 1810G managed switch) - all the VM and containers on the box get network connectivity through the bonded NIC. Obviously, the switch is managed and supports LACP and I have configured a trunk setup on the switch for the 4 ports.
Everything seems to work fine, or so I thought.
Over the weekend I installed netdata on the Proxmox host, and now I'm getting continual alarms about packet loss on bond0 (the 4 bonded NICs). I'm a little perplexed as to why.
Looking at the statistics for bond0, it seems that RX packets are getting dropped with reasonable frequency (currently showing ~160 RX packets dropped in the last 10 minutes - no TX packets seem to get dropped).
Interface output below, you'll note that the bridge interface to the VMs has no dropped packets, it's happening only on bond0 and it's slaves. The MTU is set to 9000 (jumbo frames are enabled on the switch) - I was still seeing this issue with MTU as 1500. enp12s0 is the management NIC, the other 4 NICs are the bond slaves.
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 347300 bytes 146689725 (139.8 MiB)
RX errors 0 dropped 11218 overruns 0 frame 0
TX packets 338459 bytes 132985798 (126.8 MiB)
TX errors 0 dropped 2 overruns 0 carrier 0 collisions 0
enp12s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.3 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::7285:c2ff:fe67:19b9 prefixlen 64 scopeid 0x20<link>
ether 70:85:c2:67:19:b9 txqueuelen 1000 (Ethernet)
RX packets 25416597 bytes 36117733348 (33.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 16850795 bytes 21472508786 (19.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 225363 bytes 113059352 (107.8 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 15162 bytes 2367657 (2.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp3s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 25499 bytes 6988254 (6.6 MiB)
RX errors 0 dropped 2805 overruns 0 frame 0
TX packets 263442 bytes 123302293 (117.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f0: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 33208 bytes 11681537 (11.1 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 42729 bytes 2258949 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp4s0f1: flags=6211<UP,BROADCAST,RUNNING,SLAVE,MULTICAST> mtu 9000
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 63230 bytes 14960582 (14.2 MiB)
RX errors 0 dropped 2804 overruns 0 frame 0
TX packets 17126 bytes 5056899 (4.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.1.4 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::21b:21ff:fec7:40d8 prefixlen 64 scopeid 0x20<link>
ether 00:1b:21:c7:40:d8 txqueuelen 1000 (Ethernet)
RX packets 54616 bytes 5852177 (5.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 757 bytes 61270 (59.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Initially suspecting it was some kind of buffer issue, I did some tweaking in sysctl to make sure the buffer sizes were adequate. sysctl tweaks can be found here (these did not appear to make any difference):
https://paste.linux.community/view/3b5f2b63
Network config is:
auto lo
iface lo inet loopback
auto enp12s0
iface enp12s0 inet static
address 192.168.1.3
netmask 255.255.255.0
iface enp3s0f0 inet manual
iface enp3s0f1 inet manual
iface enp4s0f0 inet manual
iface enp4s0f1 inet manual
auto bond0
iface bond0 inet manual
bond-slaves enp3s0f0 enp3s0f1 enp4s0f0 enp4s0f1
bond-miimon 100
bond-mode 802.3ad
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.1.4
netmask 255.255.255.0
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
Troubleshooting steps I took:
a) sysctl tweaks (as attached) b) MTU increase and enabling jumbo frames on the switch (no change) c) Reset the switch and recreated the LACP trunk (no change)
Any ideas on what I should try next? I'm starting to think that there is something I don't understand about the NIC teaming. As I said, everything seems to work fine but it concerns me a little about the high packet loss.
Other machines on the network that are connected to the switch do not have this issue (the 5th NIC on the machine is fine, too).
Hmm, I did suspect something like this. That would make sense (from the math, being that there's basically the same number of dropped packets across each NIC). Is there a way to tell exactly what's getting dropped? I ran dropwatch on the machine, but I really couldn't interpret the output well enough to conclude one way or the other. – NOP – 2019-02-11T10:41:20.517
My diagnostics were rather basic: Increase rate of broadcast packets and see, if drop rate increases acordingly. I didn't investigate much further, as everything was working fine with no loss of payload packets. – Eugen Rieck – 2019-02-11T10:59:53.710
I did some more digging around. I wrote a script to fire out a whole bunch of broadcast packets. At the same time, I ran tcpdump on all 4 individual NICs + the bond interface. While the broadcast traffic was mostly appearing only on 1 NIC, the odd packet did end up hitting another NIC. It didn't seem to increase the rate of dropped packets, though. The box has been running for a couple of hours, and with my several VMs and containers running + me pelting it with broadcast packets, bond0 has dropped out ~700 of 1437352 RX packets. This is representative of what I've been seeing. – NOP – 2019-02-13T02:13:53.987
Ok, to try do some further digging I shut down a particularly chatty VM on the box, which cleared up tcpdump quite a bit. Looking at the tcpdump of the bond interface while watching the interface statistics, there's very strong correlation between ARP requests being on the wire and packets getting dropped. Still sitting at around ~1.5% dropped RX packets on the bond. I guess this makes sense, right? – NOP – 2019-02-13T02:44:25.980
ARP packets are the canonical example of broadcast traffic, so this does tend to confirm our suspicions. TBH I don't know how ARP over an LACP trunk is actually supposed to be broadcast - maybe an "all ports" policy is the standrad? – Eugen Rieck – 2019-02-13T08:05:34.263