I have a HyperV cluster made up of 3 hosts. Each host is connected to both of my Nexus 5548 switches running in an etherchannel. LACP on the switch and NIC teaming using Broadcom 802.3ad on the server side. This gives me 2GB of bandwidth and also provides fault tolerance.
The problem I am having occurs when I perform a live migration. Before the live migration both Nexus switches show the MAC of the VM in the ARP table. After the migration one switch shows the MAC of the VM and the other shows the MAC of the HyperV host which it moved to.
I ran a packet capture and saw the HyperV host send a gratuitous ARP with the IP of the VM and the MAC of the host instead of the MAC of the VM. I lose layer 3 connectivity when this happens. I have to manually clear the ARP entry from the switch or wait about 7 minutes for it to correct itself.
I did some looking around and people are having similar issues when dealing with NIC teaming using Broadcom. Has anyone seen this? Any advice?
-------- Edit added below
I am only having this problem when teaming using Link Aggregation 802.3ad. The Broadcom teaming options are...
- Link Aggregation (802.3ad)
- Smart Load Balancing (TM) and Failover
- SLB (Auto-Fallback Disable
- Generic Trunking (FEC/GEC) / 802.3ad-Draft Static
I switched to Smart Load Balancing and the VM Live Migrates without losing any network connectivity. However, the ARP tables on the Nexus switches are in sync but they show the MAC address of the Host and not the VM. This is opposite of what I thought it would do. Shouldn't the ARP tables of the switches show the MAC of the VM? If not and they are suppose to show the MAC of the host, why?