Transparent LAN service on Linux

Question

I need to implement a VLAN based transparent LAN service on Linux. Meaning I need to take configured VLAN and forward it directly to the specified port (all broadcast/multicast and unicast packets).

The trivial solution will be defining 1 to 1 bridge between the VLAN interface and specified port. The downside of this solution is that I becoming aware of all mac addresses on this tunnel. I'm running on an embedded device with limited mac table and want to avoid polluting mac table with devices from to networks I'm connecting.

I was trying to find some way to use ebtables for this task but seems like -o options of ebtables is useful only on the FORWARD chain which happens to be after mac learning. BROUTING chain is the one that I need but seems like I can't force the packet to egress on the specific interface from this point.

So, ebtables seems to be a dead end. Any other options? In the ideal world, I would prefer to have a TLS service based on any key and not only VLAN, but VLAN will do for now.

Thanks, Ilya.

If doing it at bridge level is still too much, do it earlier at a lower level: tc. Are you still interested with this question? I could give a complete example (for vlan. I don't know if "any key" has proper support in tc encap/decap). Is that between a vlan tagged trunk port and untagged ports? — A.B, May 07 '19 at 15:23
I do interested. VLAN example will be good. I do have another requirement, for example, based on source IP but I will be glad to have a VLAN only as well. Regarding egress VLAN it can be transparent, or VLAN removed or VLAN replaced, but again let's assume transparent for the sake of example. — Ilya, May 07 '19 at 19:21
Since I already worked on it, I put a "VLAN removed" (+VLAN added for the other way around of course) example. — A.B, May 07 '19 at 23:17
added an alternate solution keeping a bridge using VLAN, but with MAC learning simply disabled. Now I wouldn't know how to use a bridge with something else than VLAN here. `tc` would still be useful for other selectors than VLAN. — A.B, May 09 '19 at 14:51

A.B · Accepted Answer · 2019-05-09T14:57:19.743

UPDATE: Added a solution still using a bridge. It's possible, for the VLAN case anyway, to use a Linux bridge for its VLAN filtering capabilities, and disable completely MAC learning. tc below might still be useful for its generic way of matching selectors (it would be probably easier to use tc with an adequate match for something else than VLAN as selector than use a bridge with no code to handle it).

bridge with MAC learning disabled on every port

It's possible to disable learning of MAC addresses. It's done with the bridge link command. Then the bridge can be set up to do VLAN filtering (using also bridge vlan): it doesn't need any MAC address, all its forwarding will be done based on configured VLAN settings.

learning on or learning off

Controls whether a given port will learn MAC addresses from received traffic or not. If learning if off, the bridge will end up flooding any traffic for which it has no FDB entry. By default this flag is on.

learning_sync on or learning_sync off

Controls whether a given port will sync MAC addresses learned on device port to bridge FDB.

So for example let's look at a system with interface eth0 as trunk with tagged frames,and eth1 eth2 eth3 resp. for vlan ids 10, 20 and 30, untagged. This would be done with:

ip link add name br0 type bridge vlan_filtering 1

#remove implicit bridge's self port br0 from any interaction.
# Might have to not be done if using an IP on the bridge
# but more configuration might then be needed anyway.
bridge vlan del vid 1 dev br0 self
bridge link set dev br0 learning off learning_sync off self

for $nic in eth0 eth1 eth2 eth3; do
    ip link set dev $nic master br0
    bridge link set dev $nic learning off learning_sync off
    bridge vlan del vid 1 dev $nic
done
ip link set br0 up

bridge vlan add 10 dev eth0
bridge vlan add 20 dev eth0
bridge vlan add 30 dev eth0

bridge vlan add vid 10 pvid 10 untagged dev eth1
bridge vlan add vid 20 pvid 20 untagged dev eth2
bridge vlan add vid 30 pvid 30 untagged dev eth3

To test how this different setup is behaving, just replace the following lines in the setup script at the end (which are using the tc method described in next part of the answer):

ip netns exec fakebridge tc qdisc add dev trunk0 ingress
for vlan in 10 20 30; do
    ip netns exec fakebridge tc qdisc add dev vlan$vlan ingress
    ip netns exec fakebridge tc filter add dev vlan$vlan parent ffff: matchall action vlan push id $vlan action mirred egress redirect dev trunk0
    ip netns exec fakebridge tc filter add dev trunk0 parent ffff: basic match "meta(vlan mask 0xfff eq $vlan)" action vlan pop action mirred egress redirect dev vlan$vlan
done

with these instead (it's no more a fake bridge but anyway...):

ip -n fakebridge link add name br0 type bridge vlan_filtering 1
ip netns exec fakebridge bridge vlan del vid 1 dev br0 self #remove implicit bridge's self port br0 from any interaction
ip -n fakebridge link set dev trunk0 master br0
ip netns exec fakebridge bridge vlan del vid 1 dev trunk0
ip netns exec fakebridge bridge link set dev trunk0 learning off learning_sync off
for vlan in 10 20 30; do
    ip -n fakebridge link set dev vlan$vlan master br0
    ip netns exec fakebridge bridge link set dev vlan$vlan learning off learning_sync off
    ip netns exec fakebridge bridge vlan add vid $vlan dev trunk0
    ip netns exec fakebridge bridge vlan del vid 1 dev vlan$vlan
    ip netns exec fakebridge bridge vlan add vid $vlan pvid $vlan untagged dev vlan$vlan
done
ip -n fakebridge link set br0 up

It's also possible to not use a bridge at all, and work with the VLAN ID for operations, using...

tc (traffic control)

tc is able to manipulate VLANs directly using tc vlan:

DESCRIPTION

The vlan action allows to perform 802.1Q en- or decapsulation on a packet, reflected by the operation modes POP, PUSH and MODIFY. The POP mode is simple, as no further information is required to just drop the outer-most VLAN encapsulation. The PUSH and MODIFY modes require at least a VLANID and allow to optionally choose the VLANPROTO to use.

Along with other tc features:

matchall (can be replaced with u32 match u32 0 0 on older kernels) to match unconditionally packets,
basic+ematch to match on meta informations like vlan id (this SF Q/A helped: tc u32 — how to match L2 protocols in recent kernels?),
mirred to actually move packets between interface without bridging nor routing.

and the usual plumbing (have qdisc, attach filter with action), it's possible to move packets from an interface to an other while encapsulating or decapsulating the 802.1Q VLAN ID. The system will not bridge nor route those packets. The system won't have to memorize a MAC address or manipulate an IP, its awareness of packets and protocols will be limited to what is done with tc.

Note that this is a proof of concept. Of course a real system would still have to communicate using an IP taking care to not interfere with those settings. Implementing this for production correctly probably involves unforeseen additional difficulties, considering that tc is a complex tool. There could also be other better ways available with tc to handle things more generically (thinking about tc flow to use the VLAN ID as key to be mapped into a classid, which could be used more generically, or maybe could use something else as key beside VLANs, as long as there's a way to encap/decap.).

So for example let's look at a system with interface eth0 as trunk with tagged frames,and eth1 eth2 eth3 resp. for vlan ids 10, 20 and 30, untagged. Allowing the tagged side to communicate with the correct untagged side and the reverse would be done with:

tc qdisc add dev eth0 handle ffff: ingress
tc qdisc add dev eth1 handle ffff: ingress
tc qdisc add dev eth2 handle ffff: ingress
tc qdisc add dev eth3 handle ffff: ingress

tc filter add dev eth0 parent ffff: basic match "meta(vlan mask 0xfff eq 10)" action vlan pop action mirred egress redirect dev eth1
tc filter add dev eth0 parent ffff: basic match "meta(vlan mask 0xfff eq 20)" action vlan pop action mirred egress redirect dev eth2
tc filter add dev eth0 parent ffff: basic match "meta(vlan mask 0xfff eq 30)" action vlan pop action mirred egress redirect dev eth3

tc filter add dev eth1 parent ffff: matchall action vlan push id 10 action mirred egress redirect dev eth0
tc filter add dev eth2 parent ffff: matchall action vlan push id 20 action mirred egress redirect dev eth0
tc filter add dev eth3 parent ffff: matchall action vlan push id 30 action mirred egress redirect dev eth0

It might seem logical that the actual interfaces would have to be put in promiscuous mode to actually redirect traffic, but this wasn't needed with kernel 5.0.x and veth interfaces while testing anyway.

mockup using ip netns for network namespaces

I did a few experiments to implement a fake bridge with one tagged trunk interface and a few untagged vlan interfaces using network namespaces. Each "hosts" has its own namespace and is linked to other hosts using network elements, themselves implemented with other network namespaces which include a bridge. The actual system mimicking what could do your embedded device will be called fakebridge, since it could look similar to a VLAN aware bridge.

              tagged                  untagged    _______
                                                .|host10b|
                          +-------+           .   =======
+------+                  |       |....vlan10....|host10|
|      |.......trunk......|fake   |               ====== 
|router|..................|       |....vlan20....|host20|
|      | (vlans 10+20+30) |bridge |               ======
+------+                  |       |....vlan30....|host30|
                          +-------+               ------

So 1+1+4 = 6 hosts, 1 + 3 = 4 networks for a total of 10 namespaces.

Once the script below is run (as root), one can test and observe with commands like:

term1:

ip netns exec fakebridge tcpdump -l -n -s0 -e -p -i trunk0

term2:

ip netns exec host10 ping -c1 198.51.100.20

Giving for example:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on trunk0, link-type EN10MB (Ethernet), capture size 262144 bytes
00:27:56.036743 c2:e8:f4:79:28:96 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 192.0.2.110 tell 192.0.2.10, length 28
00:27:56.036777 16:51:fa:18:21:b0 > c2:e8:f4:79:28:96, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Reply 192.0.2.110 is-at 16:51:fa:18:21:b0, length 28
00:27:56.036794 c2:e8:f4:79:28:96 > 16:51:fa:18:21:b0, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 192.0.2.10 > 198.51.100.20: ICMP echo request, id 13483, seq 1, length 64
00:27:56.036807 16:51:fa:18:21:b0 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 198.51.100.20 tell 198.51.100.120, length 28
00:27:56.036832 b6:1d:bc:33:87:98 > 16:51:fa:18:21:b0, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Reply 198.51.100.20 is-at b6:1d:bc:33:87:98, length 28
00:27:56.036841 16:51:fa:18:21:b0 > b6:1d:bc:33:87:98, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.0.2.10 > 198.51.100.20: ICMP echo request, id 13483, seq 1, length 64
00:27:56.036860 b6:1d:bc:33:87:98 > 16:51:fa:18:21:b0, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 198.51.100.20 > 192.0.2.10: ICMP echo reply, id 13483, seq 1, length 64
00:27:56.036867 16:51:fa:18:21:b0 > c2:e8:f4:79:28:96, ethertype 802.1Q (0x8100), length 102: vlan 10, p 0, ethertype IPv4, 198.51.100.20 > 192.0.2.10: ICMP echo reply, id 13483, seq 1, length 64
00:28:01.043203 16:51:fa:18:21:b0 > c2:e8:f4:79:28:96, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Request who-has 192.0.2.10 tell 192.0.2.110, length 28
00:28:01.043246 b6:1d:bc:33:87:98 > 16:51:fa:18:21:b0, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Request who-has 198.51.100.120 tell 198.51.100.20, length 28
00:28:01.043287 c2:e8:f4:79:28:96 > 16:51:fa:18:21:b0, ethertype 802.1Q (0x8100), length 46: vlan 10, p 0, ethertype ARP, Reply 192.0.2.10 is-at c2:e8:f4:79:28:96, length 28
00:28:01.043284 16:51:fa:18:21:b0 > b6:1d:bc:33:87:98, ethertype 802.1Q (0x8100), length 46: vlan 20, p 0, ethertype ARP, Reply 198.51.100.120 is-at 16:51:fa:18:21:b0, length 28

Setup script to run as root. It creates various network namespaces using ip netns, populates the required network links (bridges and veth), sets up the tc filters on fakebridge, and finally configure the various hosts's IPs so one can experiment. fakebridge stays with no IP nor bridge. There's no MAC table able to be filled: ip neigh or bridge fdb won't show anything related to traffic, since there's no ARP without IP nor MAC learning without bridge.

#!/bin/sh

if ip netns id | grep -qv '^ *$' ; then
    printf 'ERROR: leave netns "%s" first\n' $(ip netns id) >&2
    exit 1
fi

hosts='router fakebridge host10 host10b host20 host30'
nets='trunk vlan10 vlan20 vlan30'

for ns in $hosts $nets; do
    ip netns del $ns 2>/dev/null || :
    ip netns add $ns
    ip netns exec $ns sysctl -q -w net.ipv6.conf.default.disable_ipv6=1
    ip netns exec $ns sysctl -q -w net.ipv4.icmp_echo_ignore_broadcasts=0
done

for ns in $hosts; do
    ip -n $ns link set lo up
done

bmac=1
for ns in $nets; do
    ip -n $ns link add bridge0 address 02:00:00:00:00:$(printf '%02d' $bmac) type bridge
    ip -n $ns link set bridge0 up
    bmac=$(($bmac+1))
done

link_ns () {
    ip -n $1 link add name "$3" type veth peer netns $2 name "$4"
    ip -n $1 link set dev "$3" up
    ip -n $2 link set dev "$4" up

    if printf '%s\n' "$nets" | grep -q -w "$1"; then
    ip -n "$1" link set dev "$3" master bridge0
    fi
    if printf '%s\n' "$nets" | grep -q -w "$2"; then
    ip -n "$2" link set dev "$4" master bridge0
    fi
}

link_ns trunk  fakebridge fakebridge trunk0
link_ns vlan10 fakebridge fakebridge vlan10
link_ns vlan20 fakebridge fakebridge vlan20
link_ns vlan30 fakebridge fakebridge vlan30

link_ns trunk  router  router  trunk0
link_ns vlan10 host10  host10  eth0
link_ns vlan10 host10b host10b eth0
link_ns vlan20 host20  host20  eth0
link_ns vlan30 host30  host30  eth0


ip netns exec fakebridge tc qdisc add dev trunk0 ingress
for vlan in 10 20 30; do
    ip netns exec fakebridge tc qdisc add dev vlan$vlan ingress
    ip netns exec fakebridge tc filter add dev vlan$vlan parent ffff: matchall action vlan push id $vlan action mirred egress redirect dev trunk0
    ip netns exec fakebridge tc filter add dev trunk0 parent ffff: basic match "meta(vlan mask 0xfff eq $vlan)" action vlan pop action mirred egress redirect dev vlan$vlan
done

for vlan in 10 20 30; do
    ip -n router link add link trunk0 name trunk.$vlan type vlan id $vlan
    ip -n router link set dev trunk.$vlan up
    ip netns exec router sysctl -q -w net.ipv4.conf.trunk/$vlan.forwarding=1
done
ip -n router address add 192.0.2.110/24 dev trunk.10
ip -n router address add 198.51.100.120/24 dev trunk.20
ip -n router address add 203.0.113.130/24 dev trunk.30

ip -n host10 address add 192.0.2.10/24 dev eth0
ip -n host10b address add 192.0.2.11/24 dev eth0
ip -n host20 address add 198.51.100.20/24 dev eth0
ip -n host30 address add 203.0.113.30/24 dev eth0

ip -n host10 route add default via 192.0.2.110
ip -n host10b route add default via 192.0.2.110
ip -n host20 route add default via 198.51.100.120
ip -n host30 route add default via 203.0.113.130

thanks a lot. I will need to take a deeper look into tc. We do not use it since all QoS related stuff are done in hardware, but apparently, it can be used not only for QoS. — Ilya, May 08 '19 at 08:49
afaik tc can even be hardware accelerated in datacenter-oriented "switchdev" devices like those from mellanox which documents some of this and contributes to it (eg: matchall is from mellanox). https://netdevconf.org/1.2/slides/oct5/07_tcws_Mlxsw_TC_Offloads.pdf — A.B, May 08 '19 at 17:31

Transparent LAN service on Linux

1 Answers1

bridge with MAC learning disabled on every port

tc (traffic control)

mockup using ip netns for network namespaces