multiple macvlan devices and policy based routing confusion

Question

I have a server (ubuntu/debian) with two ISP connections. Both of these WAN connections have multiple public IP addresses.

(big pipe)----eth0-->\
                      > server ---eth2--(internal)
(cable pipe)--eth1-->/

On eth0 I have 4 IPs assigned to me that are a part of a broader /24 subnet. 24.xxx.xxx.xxx/24 On eth1 I have 5 IPs assigned to me but here I am the only one on a /29 (the 6th IP is the gateway I hit) 71.xxx.xxx.xxx/29

My goal is to setup source/policy based routing so that VMs/clients on the various internal subnets (there are multiple actual VLANS on eth2) can be routed out to the internet on any specified WAN IP.

Here's what I've done so far.

First I have eth0 and eth1 configured in the interfaces file.

auto eth0
iface eth0 inet static
        address 24.xxx.xxx.66
        netmask 255.255.255.0
        network 24.xxx.xxx.0
        broadcast 24.xxx.xxx.255
        gateway 24.xxx.xxx.1
        dns-nameservers 8.8.8.8
        up /etc/network/rt_scripts/i_eth0

auto eth1
iface eth1 inet static
        address 71.xxx.xxx.107
        netmask 255.255.255.248
        network 71.xxx.xxx.105
        broadcast 71.xxx.xxx.111
        up /etc/network/rt_scripts/i_eth1

Then macvlan devices on the BigPipe

#!/bin/sh

#iface BigPipe67
ip link add mac0 link eth0 address xx:xx:xx:xx:xx:3c type macvlan
ip link set mac0 up
ip address add 24.xxx.xxx.67/24 dev mac0

#iface BigPipe135
ip link add mac1 link eth0 address xx:xx:xx:xx:xx:3d type macvlan
ip link set mac1 up
ip address add 24.xxx.xxx.135/24 dev mac1

#iface BigPipe136
ip link add mac2 link eth0 address xx:xx:xx:xx:xx:3e type macvlan
ip link set mac2 up
ip address add 24.xxx.xxx.136/24 dev mac2

/etc/network/rt_scripts/t_frontdesk
/etc/network/rt_scripts/t_pubwifi
/etc/network/rt_scripts/t_mail1
/etc/network/rt_scripts/t_scansrvc

CBL connection. The missing 5th IP (71.xxx.xxx.106) is a different router sitting in the building.

#!/bin/sh
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1

#iface CBL108
ip link add mac3 link eth1 address xx:xx:xx:xx:xx:c5 type macvlan
ip link set mac3 up
ip address add 71.xxx.xxx.108/29 dev mac3

#iface CBL109
ip link add mac4 link eth1 address xx:xx:xx:xx:xx:c6 type macvlan
ip link set mac4 up
ip address add 71.xxx.xxx.109/29 dev mac4

#iface CBL110
ip link add mac5 link eth1 address xx:xx:xx:xx:xx:c7 type macvlan
ip link set mac5 up
ip address add 71.xxx.xxx.110/29 dev mac5

/etc/network/rt_scripts/t_jenkins4
/etc/network/rt_scripts/t_skynet
/etc/network/rt_scripts/t_lappy386

You'll pry notice I have a couple routes specified on the main table when I setup the macvlan interfaces on eth1. I have a couple other routers on the same cable provider as my main server. They VPN back to the main server while the BigPipe is used for everything else (on the main table).

The "t_" scripts are used to setup the individual rules and tables for the various services/clients that used the IPs setup by the macvlan interfaces.

Simplified, they look a little like this.

#!/bin/sh
ip rule add from 172.23.1.6 table scansrvc
ip route add default via 24.xxx.xxx.1 dev mac0 table scansrvc
ip route add 24.xxx.xxx.0/24 dev mac0 table scansrvc
ip route add 172.23.0.0/20 dev br1 table scansrvc

So putting that all together and as a quick recap, I've got the main server using 8 public IPs (4 on BigPipe and 4 on CBL). One of the BigPipe IPs and one of the CBL IPs are used for VPN services effectively creating a "ghetto internet exchange" if you will. That routing configuration exists on the main table.

Then the remaining 6 IPs are used by various services or clients and those tables are frontdesk, pubwifi, mail1, scansrvc, jenkins4, skynet, and lappy386.

I am masquerading on all public IPs to the various internal subnets.

Here's where I just am dumbfounded... It all works until it doesn't. Meaning, when I startup the server everything gets setup correctly and I am able to see that the routing policies are doing what they're supposed to be doing.

So, on scansrvc, which is a VM on the main server but with an internal ip (172.23.1.6/20)

waffle@scansrvc:~$ dig +short myip.opendns.com @resolver1.opendns.com
24.xxx.xxx.67

However, after a while packets stop making it back to the VM behind the main server. I could see in the iptables firewall stats that they'd leave my network but not make it back.

When it's working and I scan from the outside I can see the service port, but after it dies iptables also doesn't even see the packets make it in.

Also, through my searching I started reading about martian packets. So I turned on the logging of those through sysctl. Wow. I'm logging a ton of martians from the BigPipe but none from the CBL, perhaps because BigPipe I'm not the only one on that subnet?

Here's a snippet

Nov 22 08:59:03 srv3 kernel: [  271.747016] net_ratelimit: 497 callbacks suppressed
Nov 22 08:59:03 srv3 kernel: [  271.747027] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [  271.747035] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.747046] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac2
Nov 22 08:59:03 srv3 kernel: [  271.747052] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.747061] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac1
Nov 22 08:59:03 srv3 kernel: [  271.747066] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.796429] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [  271.796440] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06        .......N$.....
Nov 22 08:59:03 srv3 kernel: [  271.796450] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac2

From what I understand so far about martians, my hypothesis is that having multiple interfaces on the same subnet could be causing packets not meant for an interface to be sent to that interface... somehow... (I thought since they've got different MAC addresses that would be alleviated)

What would cause this? Why when I freshly boot the system and the VMS will the setup work until all the sudden dies after a while? (Ex. if I leave a ping running to 8.8.8.8 on the scansrvc VM I'll get 100-1000 responses back before it dies) Could this be something with the ARP cache? It's not like I'm reassigning any IPs to different MAC addresses mid-flight.

I'm stuck. I'm going to start to learn some tcpdump skills to try and shed some light on something I'm perhaps missing. If anyone that's better versed in networking setups could point out anything I'm missing it'd be a huge help! :)

Minor fix I found when working on my interfaces file. On eth1 the "network" field should've been 71.xxx.xxx.104 not 71.xxx.xxx.105 — wafflemann, Nov 24 '17 at 21:32

Anton Danilov · Accepted Answer · 2017-11-22T18:31:44.420

The error messages have been caused by the validation of the source of the packets (see kernel code).

I can assume, there is only only one route for directly connected overlapsed subnet in the main routing table in your setup. And when you had recieved the packet from directly connected subnet through other interface (not through that, what is in the main routing table), this packet was recognised as martian.

How to troubleshoot:

Lookup the route for this packet source with 'ip route get 24.xxx.xxx.1' command and compare the interface of the route and the interface, through what the packet had arrived, with each other. Likely they are different.

How to solve the issue:

If you're using the PBR with multiple routing tables, add the directly connected route through the corresponded interface into every of these routing tables. Maybe you should rework your PBR rules to avoid the route mismatches.
Check the rp_filter and disable it or better switch it into the loose mode (see sysctl variable)
Discard the macvlan interfaces and use the multiple addresses on interface (this is hardway, but more ideologically right, I think).

score 0 · Answer 2 · answered Nov 24 '17 at 21:26

Thanks Anton for the insight! I really appreciate the links.

Posting for the record:

I ended up setting the rp_filter to loose mode as suggested on all interfaces (net.ipv4.conf.all.rp_filter) and I found that instantly the clients using their own routing tables behaved as expected. However, the routes using the main table would no longer communicate outside of the 24.xxx.xxx.0/24 on eth0, even with a default route set. Using .default instead of .all along with enabling arp_filter on .default (which I could've swore I had it enabled previously) yielded the results desired including eliminating martians.

The .default vs .all was peculiar to me, I'll have to look into that for clearer understanding. I'm just so glad its working!

I originally went with macvlan because of the way I've perceived the kernel to handle virtual interfaces. The way I'm familiar with setting up multiple IPs on a single interface is virtually, by declaring eth0:0 in the interfaces file. Bringing up a completely separate interface allowed me to work with iproute2 easier. But, perhaps there's a cleaner way of which I'd be interested in knowing.

Once again, thanks so much for your help!

There is some non obvious detail about sysctl variables in network subsystem: the default value is used only for dynamically created interfaces (like ppp) and it is copied into interface sysctl variable. — Anton Danilov, Nov 29 '17 at 07:04

multiple macvlan devices and policy based routing confusion

2 Answers2