0

First of all, here is what my infra looks like and how it works :

enter image description here

Controller1/2 and Compute1/2 both runs VM and are linked to each other via a VPN. On each server, the br-ext interface is plugged with the ext interface (the vpn one). All server are able to communicate together and so are the VM on their private interfaces.

I have two ubuntu 16.04 router (the 2 box with ETH3 and BR-ext ), only one is active at a time (the second is a failover with keepalived) and own at the same time, the public subnet (51.38.X.Y/27) and the IP 10.38.166.190 (that act as a gateway for all VM).

I use Iptables and Iproute2 to allow traffic to let's say 51.38.X.YYA to reach 10.38.X.YYA, and from 10.38.X.YYA to go through 51.38.X.YYA.

From one of the VM, I can reach the outside without issue and if I run a curl ifconfig.co i'm prompted with the public IP which is the behavior I want.

My Issue :

If I try to reach VM2 from VM1 using it's public IP, it doesn't work at all.

I will take two VM to illustrate my issue and will give all the configuration about it :

VM1 : 10.38.166.167 / 51.38.166.167 VM2 : 10.38.166.166 / 51.38.166.166

What I've done so far :

On router1 :

ETH1 = Main interface (management) ETH3 = Interface that hold all IP and NAT to VM br-ext = bridge that contain the VPN interface ext = VPN interface (plugged on the bridge br-ext)

[root@network3] ~# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:19:3e:41 brd ff:ff:ff:ff:ff:ff
    inet 51.38.166.162/32 brd 51.38.x.162 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe19:3e41/64 scope link
       valid_lft forever preferred_lft forever

5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:72:94:cb brd ff:ff:ff:ff:ff:ff
    inet 51.38.166.163/32 brd 51.38.x.163 scope global eth3
       valid_lft forever preferred_lft forever
    inet 51.38.166.166/32 scope global eth3
       valid_lft forever preferred_lft forever
    inet 51.38.166.167/32 scope global eth3
       valid_lft forever preferred_lft forever


7: br-ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d2:f8:64:36:64:f2 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.103/9 brd 10.127.255.255 scope global br-ext
       valid_lft forever preferred_lft forever
    inet 10.0.0.120/32 scope global br-ext
       valid_lft forever preferred_lft forever
    inet 10.38.166.190/32 scope global br-ext
       valid_lft forever preferred_lft forever

10: ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br-ext state UNKNOWN group default qlen 1000
    link/ether d2:f8:64:36:64:f2 brd ff:ff:ff:ff:ff:ff

I've set a bunch of route to allow Packet coming from outside on 51.38.x.160/27 to be routed on 10.38.x.y/27

[root@network3] ~# ip ru l | grep "lookup 103"
9997:   from 10.38.x.167 lookup 103
9998:   from 10.38.x.166 lookup 103

# rules to tells that each IP of the /27 need to use table 103
10301:  from 51.38.166.163 lookup 103
10302:  from all to 51.38.166.163 lookup 103
10307:  from 51.38.166.166 lookup 103
10308:  from all to 51.38.166.166 lookup 103
10309:  from 51.38.166.167 lookup 103
10310:  from all to 51.38.166.167 lookup 103

[root@network3] ~# ip r s table 103
default via 51.38.166.190 dev eth3
51.38.166.160/27 dev eth3  scope link

[root@network3] ~# ip r s
default via 51.38.166.190 dev eth1 onlink
10.0.0.0/9 dev br-ext  proto kernel  scope link  src 10.0.0.103
172.16.0.0/16 dev br-manag  proto kernel  scope link  src 172.16.0.103

My iptables looks like follow :

[root@network3] ~# iptables -nvL
Chain INPUT (policy ACCEPT 21334 packets, 1015K bytes)
 pkts bytes target     prot opt in     out     source               destination
91877 4376K ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            /* 000 accept all icmp */
   18  1564 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0            /* 001 accept all to lo interface */
    0     0 REJECT     all  --  !lo    *       0.0.0.0/0            127.0.0.0/8          /* 002 reject local traffic not on loopback interface */ reject-with icmp-port-unreachable
 343K  123M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state ESTABLISHED /* 003 accept related established rules */
  243 14472 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 1022 /* 030 allow SSH */
 481M   42G ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 3210:3213 /* 031 allow VPNtunnel */
 4155  241K DROP       all  --  eth0   *       0.0.0.0/0            0.0.0.0/0            /* 999 drop all */

Chain FORWARD (policy ACCEPT 98325 packets, 8874K bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 964M packets, 93G bytes)
 pkts bytes target     prot opt in     out     source               destination

Iptables NAT rules

[root@network3] ~# iptables -t nat -nvL --line
Chain PREROUTING (policy ACCEPT 156K packets, 6455K bytes)
num   pkts bytes target     prot opt in     out     source               destination
31   11228  771K DNAT       all  --  *      *       0.0.0.0/0            51.38.166.166        /* 112 NAT for 10.38.166.166 */ to:10.38.166.166
32   11624  809K DNAT       all  --  *      *       0.0.0.0/0            51.38.166.167        /* 112 NAT for 10.38.166.167 */ to:10.38.166.167

Chain INPUT (policy ACCEPT 85077 packets, 3527K bytes)
num   pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 16505 packets, 1294K bytes)
num   pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 105K packets, 4357K bytes)
num   pkts bytes target     prot opt in     out     source               destination              destination
31      17  1196 SNAT       all  --  *      *       10.38.166.166        0.0.0.0/0             to:51.38.166.166
32       8   549 SNAT       all  --  *      *       10.38.166.167        0.0.0.0/0             to:51.38.166.167

I also inserted somes rules in the RAW tables to help me track packets :

[root@network3] ~# iptables -t raw -nvL
Chain PREROUTING (policy ACCEPT 3765 packets, 227K bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 TRACE      all  --  *      *       51.38.166.167        0.0.0.0/0
  185 12988 TRACE      all  --  *      *       0.0.0.0/0            51.38.166.167

Chain OUTPUT (policy ACCEPT 7941 packets, 837K bytes)
 pkts bytes target     prot opt in     out     source               destination

Testing from VM1 :

ubuntu@test-1:~$ ip a l dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:51:0a:0b brd ff:ff:ff:ff:ff:ff
    inet 10.38.166.167/24 brd 10.38.166.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe51:a0b/64 scope link
       valid_lft forever preferred_lft forever

ubuntu@test-1:~$ curl ifconfig.co
51.38.166.167

ubuntu@test-1:~$ ping 51.38.166.166 -c 4
PING 51.38.166.166 (51.38.166.166) 56(84) bytes of data.

--- 51.38.166.166 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3031ms

Testing from VM2 :

ubuntu@test-2:~$ ip a l dev ens3
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:3e:9d:79:ce brd ff:ff:ff:ff:ff:ff
    inet 10.38.166.166/24 brd 10.38.166.255 scope global ens3
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe9d:79ce/64 scope link
       valid_lft forever preferred_lft forever

ubuntu@test-2:~$ curl ifconfig.co
51.38.166.166

ubuntu@test-2:~$ ping 51.38.166.167 -c 4
PING 51.38.166.167 (51.38.166.167) 56(84) bytes of data.

--- 51.38.166.167 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3023ms

LOGS from network3 :

[root@network3] ~# tail -f /var/log/kern.log | grep "SRC=10.38.166.166 DST=51.38.166.167"
Jul  5 11:58:12 network3 kernel: [79540.314496] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49094 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=57
Jul  5 11:58:13 network3 kernel: [79541.322501] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul  5 11:58:13 network3 kernel: [79541.322543] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul  5 11:58:13 network3 kernel: [79541.322574] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49203 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=58
Jul  5 11:58:14 network3 kernel: [79542.330582] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330615] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330639] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
^C

As the ID do not change for a given SEQ, I can search anything in log regarding this ID/SEQ :

[root@network3] ~# grep "ID=49367" /var/log/kern.log
Jul  5 11:58:14 network3 kernel: [79542.330582] TRACE: raw:PREROUTING:policy:3 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330615] TRACE: mangle:PREROUTING:policy:1 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59
Jul  5 11:58:14 network3 kernel: [79542.330639] TRACE: nat:PREROUTING:rule:32 IN=br-ext OUT= MAC=de:01:31:2d:47:18:fa:16:3e:9d:79:ce:08:00 SRC=10.38.166.166 DST=51.38.166.167 LEN=84 TOS=0x00 PREC=0x00 TTL=64 ID=49367 DF PROTO=ICMP TYPE=8 CODE=0 ID=4992 SEQ=59

If I refer to this diagram : http://inai.de/images/nf-packet-flow.png

It's seems to be stuck on the routing decision. (I've discard the possibility to be stucked in the bridging decision, because it's exactly the same behavior if I do the exact same thing without any bridge involved).

The other possibility would be that it match the NAT prerouting rules 32 but doesn't apply it, but I can't figure why.

any clue of something I'm missing in that case ?

mitsugoya
  • 54
  • 6

1 Answers1

1

The most frequent cause of dropping packets at routing decision is the rp_filter.

Check output of command ip route get 51.38.166.167 from 10.38.166.166 iif br-ext. In normal case it should return a valid route. The invalid cross-device link result means that packets will be dropped by the rp_filter. Also check output of nstat -az TcpExtIPReversePathFilter. It's a counter of such dropped packets.

To check the current mode of the rp_filter use ip netconf show dev br-ext command.

Use the sysctl command to tune this parameter.

Anton Danilov
  • 4,874
  • 2
  • 11
  • 20
  • Many thanks for your clues and tips, it gave me enough informations to solve my issue. I had to remove all the rule for private IP, and set the net.ipv4.conf.*.rp_filter parameters to 2 to get it working. (0 to disable completely, 1 to enable it and 2 to still do a verification but on all interface (source here : https://www.slashroot.in/linux-kernel-rpfilter-settings-reverse-path-filtering ). – mitsugoya Jul 05 '19 at 14:22