1

Background

I have a debian server that has 3 network interfaces which are:

  • eno1 (10.0.0.35/24)
  • eno1.10 (10.0.10.65/24)
  • eno1.40 (10.0.40.40/24)

Between those interfaces is a firewall. The multiple routes on the server lead to asymetrical routing which was blocked by firewall as invalid traffic.

Because of that I added some policy-based rules so the destination/source IP address stay the same. I accomplished this by editing my /etc/network/interfaces like this:

# The primary network interface
allow-hotplug eno1
iface eno1 inet dhcp
  post-up ip route add 10.0.0.0/24 dev eno1 table 1
  post-up ip route add default via 10.0.0.1 table 1
  post-up ip rule add from 10.0.0.35/32 table 1 priority 100
  post-up ip route flush cache
  pre-down ip rule del from 10.0.0.35/32 table 1 priority 100
  pre-down ip route flush table 1
  pre-down ip route flush cache

# VLANS
auto eno1.10
iface eno1.10 inet dhcp
  post-up ip route add 10.0.10.0/24 dev eno1.10 table 2
  post-up ip route add default via 10.0.10.1 table 2
  post-up ip rule add from 10.0.10.65/32 table 2 priority 110
  post-up ip route flush cache
  pre-down ip rule del from 10.0.10.65/32 table 2 priority 110
  pre-down ip route flush table 2
  pre-down ip route flush cache

auto eno1.40
iface eno1.40 inet dhcp
  post-up ip route add 10.0.40.0/24 dev eno1.40 table 3
  post-up ip route add default via 10.0.40.1 table 3
  post-up ip rule add from 10.0.40.40/32 table 3 priority 120
  post-up ip route flush cache
  pre-down ip rule del from 10.0.40.40/32 table 3 priority 120
  pre-down ip route flush table 3
  pre-down ip route flush cache

All the services running on the server were now working as they should be.

Additionally I have a docker host running on the server that hosts some containers which are bound to the different interfaces on the server.

Problem

Now the problem is that the rules I created apparently don't apply to traffic coming from the docker containers and I can't access them because the traffic is being blocked as invalid.

What would I need to do here for the docker containers to know which route to use according to the source IP?

1 Answers1

1

The quick solution:

  • Add the routing rules by firewall mark. Packets with a correspond mark will be routed through a separate routing table.
ip rule add fwmark 0x1 lookup 1 pref 10001
ip rule add fwmark 0x2 lookup 2 pref 10002
ip rule add fwmark 0x3 lookup 3 pref 10003
  • The mark of incoming connections depends on an input interface. The connmark target saves a mark value inside a conntrack entry.
iptables -t mangle -A PREROUTING -m conntrack --ctstate NEW -i eno1 -j CONNMARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m conntrack --ctstate NEW -i eno1.10 -j CONNMARK --set-mark 0x2
iptables -t mangle -A PREROUTING -m conntrack --ctstate NEW -i eno1.40 -j CONNMARK --set-mark 0x3
  • Copy the mark value from the conntrack entry to the firewall mark. After this the replied packet will be routing by additional routing rules, those have been added. Use additional -i match or match by source address, otherwise you need add directly connected routes into additional tables.
iptables -t mangle -A PREROUTING -i docker0 -j CONNMARK --restore-mark
  • Also you can use the match by source address instead an input interface.
iptables -t mangle -A PREROUTING --src <container-subnet> -j --restore-mark
  • This solution perfectly works with DNAT.
  • Use the tcpdump and the conntrack tool to troubleshoot issues.
  • Also check the rp_filter. It can drop the packets in some cases. Better set it into the loose mode (sysctl -w net.ipv4.conf.all.rp_filter=2).

Update

After some tests in the lab I've found a perfect rule set. It requires only one mark value and one additional routing rule per uplink. It also handle complex cases, when you use public addresses on several interfaces.

  • For every uplink create an additional routing table and assign a firewall mark.
ip route add <uplink-subnet> dev <uplink-iface> table <uplink-table>
ip route add 0/0 via <uplink-gw> dev <uplink-iface> table <uplink-table>

ip rule add fwmark <uplink-mark> table <uplink-table>
  • For every uplink interface add single rule to mark incoming connections:
iptables -t mangle -A PREROUTING -i <uplink-iface> -m conntrack --ctstate NEW --ctdir ORIGINAL -j CONNMARK --set-mark <uplink-mark>
...
  • Add two rules for all uplinks to mark reply packets:
iptables -t mangle -A PREROUTING -m conntrack ! --ctstate NEW --ctdir REPLY -m connmark ! --mark 0x0 -j CONNMARK --restore-mark

iptables -t mangle -A OUTPUT -m conntrack ! --ctstate NEW --ctdir REPLY -m connmark ! --mark 0x0 -j CONNMARK --restore-mark
Anton Danilov
  • 4,874
  • 2
  • 11
  • 20
  • 1
    Did you mean `--set-mark` instead of `--save-mark`? I tried everything and it works. Regarding the last point, is there a nicer way to do this? Because I don't use the docker0 bridge but user-defined bridges (with compose) that have weird names and change every time I recreate the network. – Claypenguin Jul 02 '19 at 10:52
  • Yep, sure. The `--set-mark` is what you need, not `--save-mark`. I'll fix the answer. – Anton Danilov Jul 02 '19 at 11:00
  • Use the match by source address. Also you can add directly connected routes to custom routing tables (`ip route add dev table 1` etc). In this case you can use simple rule (`iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark`) without additional matches. – Anton Danilov Jul 02 '19 at 11:05
  • Well the source address also changes when recreating a docker network. I guess I could just create the networks manually and never touch them again. By adding custom routing tables you mean adding the container-subnet? Wouldn't that mean that I don't need the marks anymore? – Claypenguin Jul 02 '19 at 11:32
  • The marks are required to route replies from containers through same interface, through which original packets have been received. Adding container subnets to custom routing table helps to avoid the iptables rules modification every time when you create/remove a container. – Anton Danilov Jul 02 '19 at 12:28