Linux bonding (balance-tlb), KVM guests and L2 switches = unicast flooding?

Question

I have a unicast flooding problem on my network, that started when I moved some software to virtualized guests. It seems very similar to what reported here: Switch flooding when bonding interfaces in Linux . That question dates back to 2012... so maybe now there's a better solution, maybe on Linux/KVM side.

In the following I'll try to explain the architecture and the troubleshooting steps I carried out. I hope somebody could give me some hints and maybe a solution! Thanks in advance!

ARCHITECTURE

Server

Linux host with PROXMOX 4.1 and several Windows virtual machines.

The host has 4 Gbit ethernet interfaces (with MAC addresses A, B, C and D), bonded with the balance-tlb method.

The bond is then bridged to the virtual machines. Therefore each VM has its own MAC address (with MAC addresses X, Y, Z,...).

The software hosted on the virtual machines interacts with many devices in the field.

Network

The server is connected to a Juniper switch, which then connects to a wide Cisco network. Everything is level 2.

PROBLEM

On the Cisco network I see, from time to time, unicast storms. It seems they start each 5 minutes or multiples of it. I analyzed the traffic and I see that suddenly the traffic FROM some devices to a certain virtual machine (and not vice-versa) is replicated on all the physical ports of the switches (on the same VLAN). The problem solves alone after some seconds.

IDEA

Reading Cisco documentation (regarding unicast flooding and MAC "aging time") and also the aforementioned link, I found that the problem may due to the fact that the MAC address of the virtual machines does not appear so often on the network, so that after a certain "aging time" the switches start to forward such traffic to all ports until they discover where the host is.

TROUBLESHOOTING

I connected a laptop on the network and started to ping it from one virtual machine. I sniffed the packets on the laptop.

From this I could see:

ARP request from the virtual machine, using as MAC source its own MAC address (let's say X)
ARP reply from the laptop, using as MAC source its own MAC address (L) and destination the VM MAC address (X)
ping requests from the virtual machine, using as MAC source one of the MAC addresses of the bonded physical ethernet ports (A, B, C, D, and switching from time to time between three of them) and as MAC destination L
ping replies from the laptop, using as MAC source L and as MAC destination the virtual machine MAC address (X)

Basically it seems that, except for the first ARP request, the virtual machine never appears to the laptop with its own MAC address (X) but always with A, B, C or D (varying in time). However, the laptop always responds to X.

SOLUTION?

I read that it's ok in balance-tlb mode that traffic goes out from different interfaces depending on load. However, I think that this behaviour combined with the fact that virtual machines appear on the net with the source MAC address of the physical interface in use may generate the problem I reported.

If this is correct, does anybody know whether there is a way to always force the use of the VM own MAC address for every communication? (e.g. as it already happens for ARP requests) Or maybe the solution is somewhere else?

I thought that I could set up Windows VMs for resetting the ARP table every 3 minutes... but this seems a bit too much brute force to me... :)

Thanks again for any help!

EDIT: I confirm that if during a flooding event I quickly log into the corresponding VM and issue an ARP table reset, I see new ARP requests from the VM (telling its own MAC address to the net) and the storm stops immediately.

Perhaps load-balancing the traffic outbound without a properly aggregated link is not such a good idea. You also don't want a single MAC address to be flapping between different ports, because this will probably hit CPU on the switch or worse. I'm not familiar with the virtualisation technology that you mention, but it seems that there is some redesign work required here. — marctxk, Jul 07 '16 at 14:24
Yes, MAC flapping would be a problem in my case. I could try to change the aggregated link from balance-tlb to LACP (so, AFAIU, only one NIC physical MAC is used on the bonding channel). However, this is not simple at the moment... I'm also not sure it solves the problem, as the virtual machine would probably still appear on the net with the NIC physical address instead of its own, and this seems to me to be the cause of the flooding. — z2k, Jul 07 '16 at 15:22

score 0 · Accepted Answer · answered Jul 09 '16 at 19:43

0

Balance-tlb (mode 5) and balance-alb (mode 6) do not work with virtual bridges. They can cause broadcast loops, they rewrite source MAC in packets under some conditions, and mode 6 intercepts ARP by design.

You need to use active-backup (mode 1) with no switch config, or balance-xor (mode 2) or 802.3ad (mode 4) with switch config.

You could also use round-robin (mode 0) or broadcast (mode 3) with switch config, but these are not good for TCP stream performance.

answered Jul 09 '16 at 19:43

suprjami

3,476
20
29

Thanks for your answer. Yes, I think you highlighted the problem. Unfortunately I was not aware of this behavior of balance-tlb about MAC addresses of virtual machines, my fault. I did a test with 802.3ad and it seems that virtual machines always appear to the network with their own MAC address, and that probably will solve my problem. However, it seems to me that 802.3ad does a "worse" load balancing with respect to balance-tlb. I'll read more about this. If you have any hint about a guide, howto, etc. it would be much appreciated! Thanks again. – z2k Jul 11 '16 at 08:25
Look into the `xmit_hash_policy` module option. You can balance on MAC address, MAC and IP, or IP and Port. Note that IP and port is not 802.3ad compliant but most switches will handle it no problem. – suprjami Jul 12 '16 at 09:37
Please feel free to mark my response as the answer if you feel it is correct. – suprjami Jul 12 '16 at 09:38

score -1 · Answer 2 · answered Jul 07 '16 at 14:25

-1

https://en.wikipedia.org/wiki/Unicast_flood It is possible that your :::::::""""hosts with ARP timers longer than the address cache timeout on switches.....""""" as per the article. Try setting your KVM hypervisor host's and VM hosts', ARP timers to be shorter than that of the Switch itself, to which they connect through the physical ethernet port. Please let us know what you find. And share with us. Thanks.

answered Jul 07 '16 at 14:25

mkzia

9
1

Thanks for your answer. It surely makes sense. I also mentioned this in the question. However, the same software/operating system working on a dedicated physical machine (instead of a virtual one) does not create any flooding problem because switches learn alone where to forward packets by inspecting the source MAC address of incoming packets. Now it seems it's not possible to do this because of the behaviour I reported, and this seems to trigger flooding. I would like to know whether there's a more "elegant" solution than multiply ARP requests. Thanks again! – z2k Jul 07 '16 at 14:54