I have a server (ubuntu/debian) with two ISP connections. Both of these WAN connections have multiple public IP addresses.
(big pipe)----eth0-->\
> server ---eth2--(internal)
(cable pipe)--eth1-->/
On eth0 I have 4 IPs assigned to me that are a part of a broader /24 subnet. 24.xxx.xxx.xxx/24 On eth1 I have 5 IPs assigned to me but here I am the only one on a /29 (the 6th IP is the gateway I hit) 71.xxx.xxx.xxx/29
My goal is to setup source/policy based routing so that VMs/clients on the various internal subnets (there are multiple actual VLANS on eth2) can be routed out to the internet on any specified WAN IP.
Here's what I've done so far.
First I have eth0 and eth1 configured in the interfaces file.
auto eth0
iface eth0 inet static
address 24.xxx.xxx.66
netmask 255.255.255.0
network 24.xxx.xxx.0
broadcast 24.xxx.xxx.255
gateway 24.xxx.xxx.1
dns-nameservers 8.8.8.8
up /etc/network/rt_scripts/i_eth0
auto eth1
iface eth1 inet static
address 71.xxx.xxx.107
netmask 255.255.255.248
network 71.xxx.xxx.105
broadcast 71.xxx.xxx.111
up /etc/network/rt_scripts/i_eth1
Then macvlan devices on the BigPipe
#!/bin/sh
#iface BigPipe67
ip link add mac0 link eth0 address xx:xx:xx:xx:xx:3c type macvlan
ip link set mac0 up
ip address add 24.xxx.xxx.67/24 dev mac0
#iface BigPipe135
ip link add mac1 link eth0 address xx:xx:xx:xx:xx:3d type macvlan
ip link set mac1 up
ip address add 24.xxx.xxx.135/24 dev mac1
#iface BigPipe136
ip link add mac2 link eth0 address xx:xx:xx:xx:xx:3e type macvlan
ip link set mac2 up
ip address add 24.xxx.xxx.136/24 dev mac2
/etc/network/rt_scripts/t_frontdesk
/etc/network/rt_scripts/t_pubwifi
/etc/network/rt_scripts/t_mail1
/etc/network/rt_scripts/t_scansrvc
CBL connection. The missing 5th IP (71.xxx.xxx.106) is a different router sitting in the building.
#!/bin/sh
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1
ip route add xxx.xxx.xxx.xxx/20 via 71.xxx.xxx.105 dev eth1
#iface CBL108
ip link add mac3 link eth1 address xx:xx:xx:xx:xx:c5 type macvlan
ip link set mac3 up
ip address add 71.xxx.xxx.108/29 dev mac3
#iface CBL109
ip link add mac4 link eth1 address xx:xx:xx:xx:xx:c6 type macvlan
ip link set mac4 up
ip address add 71.xxx.xxx.109/29 dev mac4
#iface CBL110
ip link add mac5 link eth1 address xx:xx:xx:xx:xx:c7 type macvlan
ip link set mac5 up
ip address add 71.xxx.xxx.110/29 dev mac5
/etc/network/rt_scripts/t_jenkins4
/etc/network/rt_scripts/t_skynet
/etc/network/rt_scripts/t_lappy386
You'll pry notice I have a couple routes specified on the main table when I setup the macvlan interfaces on eth1. I have a couple other routers on the same cable provider as my main server. They VPN back to the main server while the BigPipe is used for everything else (on the main table).
The "t_" scripts are used to setup the individual rules and tables for the various services/clients that used the IPs setup by the macvlan interfaces.
Simplified, they look a little like this.
#!/bin/sh
ip rule add from 172.23.1.6 table scansrvc
ip route add default via 24.xxx.xxx.1 dev mac0 table scansrvc
ip route add 24.xxx.xxx.0/24 dev mac0 table scansrvc
ip route add 172.23.0.0/20 dev br1 table scansrvc
So putting that all together and as a quick recap, I've got the main server using 8 public IPs (4 on BigPipe and 4 on CBL). One of the BigPipe IPs and one of the CBL IPs are used for VPN services effectively creating a "ghetto internet exchange" if you will. That routing configuration exists on the main table.
Then the remaining 6 IPs are used by various services or clients and those tables are frontdesk, pubwifi, mail1, scansrvc, jenkins4, skynet, and lappy386.
I am masquerading on all public IPs to the various internal subnets.
Here's where I just am dumbfounded... It all works until it doesn't. Meaning, when I startup the server everything gets setup correctly and I am able to see that the routing policies are doing what they're supposed to be doing.
So, on scansrvc, which is a VM on the main server but with an internal ip (172.23.1.6/20)
waffle@scansrvc:~$ dig +short myip.opendns.com @resolver1.opendns.com
24.xxx.xxx.67
However, after a while packets stop making it back to the VM behind the main server. I could see in the iptables firewall stats that they'd leave my network but not make it back.
When it's working and I scan from the outside I can see the service port, but after it dies iptables also doesn't even see the packets make it in.
Also, through my searching I started reading about martian packets. So I turned on the logging of those through sysctl. Wow. I'm logging a ton of martians from the BigPipe but none from the CBL, perhaps because BigPipe I'm not the only one on that subnet?
Here's a snippet
Nov 22 08:59:03 srv3 kernel: [ 271.747016] net_ratelimit: 497 callbacks suppressed
Nov 22 08:59:03 srv3 kernel: [ 271.747027] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [ 271.747035] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$.....
Nov 22 08:59:03 srv3 kernel: [ 271.747046] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac2
Nov 22 08:59:03 srv3 kernel: [ 271.747052] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$.....
Nov 22 08:59:03 srv3 kernel: [ 271.747061] IPv4: martian source 24.xxx.xxx.43 from 24.xxx.xxx.1, on dev mac1
Nov 22 08:59:03 srv3 kernel: [ 271.747066] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$.....
Nov 22 08:59:03 srv3 kernel: [ 271.796429] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac0
Nov 22 08:59:03 srv3 kernel: [ 271.796440] ll header: 00000000: ff ff ff ff ff ff cc 4e 24 9c 1d 00 08 06 .......N$.....
Nov 22 08:59:03 srv3 kernel: [ 271.796450] IPv4: martian source 24.xxx.xxx.211 from 24.xxx.xxx.1, on dev mac2
From what I understand so far about martians, my hypothesis is that having multiple interfaces on the same subnet could be causing packets not meant for an interface to be sent to that interface... somehow... (I thought since they've got different MAC addresses that would be alleviated)
What would cause this? Why when I freshly boot the system and the VMS will the setup work until all the sudden dies after a while? (Ex. if I leave a ping running to 8.8.8.8 on the scansrvc VM I'll get 100-1000 responses back before it dies) Could this be something with the ARP cache? It's not like I'm reassigning any IPs to different MAC addresses mid-flight.
I'm stuck. I'm going to start to learn some tcpdump skills to try and shed some light on something I'm perhaps missing. If anyone that's better versed in networking setups could point out anything I'm missing it'd be a huge help! :)