Set up bridged vxlan network in linux

Question

I'm trying out vxlan with linux, and this problem has kept me stuck for days.

Simple Vxlan works fine

Simple vxlan with muticast works for cross-host communication, this simply create a vxlan vtep and assign an ip address:

ip link add vxlan100 type vxlan id 100 group 239.1.1.1 dev enp0s8
ip addr add 10.20.1.2/24 dev vxlan100
ip link set vxlan100 up

After running the above commands on both hosts, the topology is like:

And this works fine!

Bridged vxlan does not work

Then I try to setup bridged vxlan, to connect containers with vxlan, it does not work. Here is what I did to setup bridge and vxlan:

ip link add br0 type bridge
ip link add vxlan100 type vxlan id 100 group 239.1.1.1 dev enp0s8
ip link set dev vxlan100 master br0
ip link set vxlan100 up
ip link set br0 up

As for vms/containers, I simply use network namespace and veth peer for testing purpose:

ip link add veth0 tyep veth peer name veth1
ip link set dev veth0 master br0
ip link set veth0 up

ip netns add container1
ip link set dev veth1 netns container1
ip netns exec container1 ip link set lo up
ip netns exec contianer1 ip link set veth1 name eth0
ip netns exec container1 ip addr add 10.20.1.2/24 dev eth0
ip netns exec container1 ip link set eth0 up

And the topology is like the following diagram:

When I try to ping VM2 from VM1, it prints out destination Host Unreachable error:

[root@localhost ~]# ip netns exec container2 ping -c 3 10.20.1.3
PING 10.20.1.3 (10.20.1.3) 56(84) bytes of data.
From 10.20.1.2 icmp_seq=1 Destination Host Unreachable
From 10.20.1.2 icmp_seq=2 Destination Host Unreachable
From 10.20.1.2 icmp_seq=3 Destination Host Unreachable

--- 10.20.1.3 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 1999ms

Using tcpdump to capture the packet on br0, the result:

[root@localhost vagrant]# tcpdump -e -nn -i br0                                                                                                                                          
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:35:02.533609 0e:f3:f2:c1:9a:b5 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.20.1.3 tell 10.20.1.2, length 28
15:35:02.533609 0e:f3:f2:c1:9a:b5 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.20.1.3 tell 10.20.1.2, length 28
15:35:02.534184 76:c2:07:e6:c2:7b > 0e:f3:f2:c1:9a:b5, ethertype ARP (0x0806), length 42: Reply 10.20.1.3 is-at 76:c2:07:e6:c2:7b, length 28
15:35:03.534274 0e:f3:f2:c1:9a:b5 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.20.1.3 tell 10.20.1.2, length 28
15:35:03.534274 0e:f3:f2:c1:9a:b5 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.20.1.3 tell 10.20.1.2, length 28
15:35:03.535261 76:c2:07:e6:c2:7b > 0e:f3:f2:c1:9a:b5, ethertype ARP (0x0806), length 42: Reply 10.20.1.3 is-at 76:c2:07:e6:c2:7b, length 28
15:35:04.536105 0e:f3:f2:c1:9a:b5 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.20.1.3 tell 10.20.1.2, length 28
15:35:04.536105 0e:f3:f2:c1:9a:b5 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.20.1.3 tell 10.20.1.2, length 28
15:35:04.536696 76:c2:07:e6:c2:7b > 0e:f3:f2:c1:9a:b5, ethertype ARP (0x0806), length 42: Reply 10.20.1.3 is-at 76:c2:07:e6:c2:7b, length 28
^C
9 packets captured
9 packets received by filter
0 packets dropped by kernel

As the output show, ARP request was sent in vxlan, and got a response to br0, but the bridge does not forward it to VM1. There are two issues I fully does not understand:

Why there are two ARP requests for each ICMP ping?
Why doesn't br0 forward ARP response to VM1, even when the destination MAC address is exactly VM1?

For your reference I'm reading this 2017 vxlan linux post by vincent bernat.

Not sure if I did something wrong, or miss some configuration. Really need a solution or debug tips.

@AndiJay I upgrade my kernel from 3.10 to 4.4, and the problem goes away. So it seems like a kernel issue, but I didn't get the root cause. — cizixs, Oct 31 '17 at 02:35
This is just a guess but could it be a matter of frame size? I believe vxlan adds some overhead so you must shorten MTU by 50 - 72 bytes on the virtual net, maybe the new kernel has some magic for that — cmc, Apr 09 '19 at 23:41
Yes, you're probably missing the "bridge fdb" commands pointed out above, but I also shot myself in the foot by having duplicate MAC addresses in some place. Double check your interfaces to see if you have a duplicate MAC address..... tcpdump reveals all. =) — Michael Galaxy, Nov 13 '20 at 19:39
Could you take a look on this? I have somehow similar situation, but I have no idea how to solve it. Maybe from your experience you may know what is wrong? https://serverfault.com/questions/1093914/how-to-bridge-tap-device-to-overlay-network?noredirect=1 — Mohammed Noureldin, Feb 19 '22 at 17:25

score 2 · Answer 1 · answered Nov 06 '20 at 14:45

This could be related to head-end replication of broadcast frames

In my case (point-to-multipoint vxlan bridged with vlan) similar issue was fixed by adding static VTEPs entries

ip link add vxlan0 type vxlan id 100 local 198.51.100.10 remote 203.0.113.110  dev ens0  dstport 4789
bridge fdb append 00:00:00:00:00:00 dev vxlan0 dst 203.0.113.110

source: https://vincent.bernat.ch/en/blog/2017-vxlan-linux#unicast-with-static-flooding

Set up bridged vxlan network in linux

Simple Vxlan works fine

Bridged vxlan does not work

1 Answers1