2

Network Description

Virtual hosting environment (KVM):

Guest:

Ubuntu 14.04.5 LTS \n \l
Linux ari 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 15:31:16 UTC 2013 i686 i686 i686 GNU/Linux

Host:

Ubuntu 14.04.3 LTS \n \l
Linux host 3.13.0-74-generic #118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Network:

          eth0 |----------| virbr63                    eth0 |----------|
---------------|   HOST   |---------------------------------|  ari     |
  11.22.33.44  |----------| 192.168.63.1       192.168.63.2 |----------|
  • 11.22.33.44 is the public IP address
  • ari is a virtual machine (guest)
  • HOST is a physical machine (virtual machine host)
  • eth0 is a physical network card in HOST
  • virbr63 is a virtual network adapter

There is an iptables rule on HOST:

-I PREROUTING -p tcp -d 11.22.33.44 --dport 80 -j DNAT --to 192.168.63.2:8888

Let's say mydomain.com resolves to 11.22.33.44. ari is serving all the HTTP requests incoming on 11.22.33.44. Curl-ing mydomain.com works from anywhere on the Internet.

Problem

When I try to access mydomain.com through HTTP from ari, it does not work (curl hangs).

This is what the unsuccessful curl attempt looks like in tcpdump (on the host):

host$ sudo tcpdump -i virbr63 port 8888
22:03:15.541155 IP 192.168.63.2.42740 > 192.168.63.2.8888: Flags [S], seq 786111635, win 14600, options [mss1460,sackOK,TS val 1662005624 ecr 0,nop,wscale 5], length 0
22:03:15.541173 IP 192.168.63.2.42740 > 192.168.63.2.8888: Flags [S], seq 786111635, win 14600, options [mss1460,sackOK,TS val 1662005624 ecr 0,nop,wscale 5], length 0

This keeps repeating every few seconds.

This is what a successful curl attempt (from outside) looks like in tcpdump (on the host):

host$ sudo tcpdump -i virbr63 port 8888
21:59:10.924031 IP external.xxx.47812 > 192.168.63.2.8888: Flags [S], seq 2881442181, win 29200, options [mss 1420,sackOK,TS val 4022859071 ecr 0,nop,wscale 7], length 0
21:59:10.924339 IP 192.168.63.2.8888 > external.xxx.47812: Flags [S.], seq 1044842547, ack 2881442182, win 14480, options [mss 1460,sackOK,TS val 1661944471 ecr 4022859071,nop,wscale 5], length 0
21:59:10.968371 IP external.xxx.47812 > 192.168.63.2.8888: Flags [.], ack 1, win 229, options [nop,nop,TS val 4022859117 ecr 1661944471], length 0
21:59:10.976415 IP external.xxx.47812 > 192.168.63.2.8888: Flags [P.], seq 1:72, ack 1, win 229, options [nop,nop,TS val 4022859117 ecr 1661944471], length 71
21:59:10.976683 IP 192.168.63.2.8888 > external.xxx.47812: Flags [.], ack 72, win 453, options [nop,nop,TS val 1661944484 ecr 4022859117], length 0
21:59:10.977985 IP 192.168.63.2.8888 > external.xxx.47812: Flags [P.], seq 1:909, ack 72, win 453, options [nop,nop,TS val 1661944484 ecr 4022859117], length 908
21:59:11.025271 IP external.xxx.47812 > 192.168.63.2.8888: Flags [.], ack 909, win 243, options [nop,nop,TS val 4022859175 ecr 1661944484], length 0
21:59:11.030033 IP external.xxx.47812 > 192.168.63.2.8888: Flags [F.], seq 72, ack 909, win 243, options [nop,nop,TS val 4022859175 ecr 1661944484], length 0
21:59:11.030375 IP 192.168.63.2.8888 > external.xxx.47812: Flags [F.], seq 909, ack 73, win 453, options [nop,nop,TS val 1661944497 ecr 4022859175], length 0
21:59:11.075205 IP external.xxx.47812 > 192.168.63.2.8888: Flags [.], ack 910, win 243, options [nop,nop,TS val 4022859223 ecr 1661944497], length 0
  • external.xxx is the reverse DNS of the IP address that is making the request

This is a monitoring setup, so I'm looking for a solution with as few changes on host as possible. Preferably no changes on host, just convincing ari (guest) to accept the packets and route the responses through the network.

Non-Solutions

What Doesn't Solve My Problem

Access ari:8888 directly

This does not help, because this is a monitoring setup and the whole purpose is to test if 11.22.33.44:80 works.

Access from a different guest (virtual machine)

It does work (also from the same network, 192.168.63.0/24), but it doesn't solve the problem.

What I've Tried and Doesn't Work

Similar Questions

I've looked at the following questions and the answers:

DNAT from localhost (127.0.0.1)

KVM guest cannot connect to host, but works vice versa

accept_local

ari$ sudo sysctl -w net.ipv4.conf.eth0.accept_local=1

It doesn't solve the problem, doesn't change the tcpdump output.

route_localnet

ari$ sudo sysctl -w net.ipv4.conf.eth0.route_localnet=1

It doesn't solve the problem, doesn't change the tcpdump output.

rp_filter

ari$ sudo sysctl -w net.ipv4.conf.eth0.rp_filter=0

It doesn't solve the problem, doesn't change the tcpdump output.

Second (Virtual) Network Interface on ari

I've tried adding a second network interface eth0:1 with IP address 192.168.63.200. Accessing 192.168.63.2:8888 from 192.168.63.200 fails the same way:

17:42:08.746328 IP 192.168.63.200.41676 > 192.168.63.2.8888: Flags [S], seq 3211292625, win 14600, options [mss 1460,sackOK,TS val 1744488483 ecr 0,nop,wscale 5], length 0
17:42:08.746351 IP 192.168.63.200.41676 > 192.168.63.2.8888: Flags [S], seq 3211292625, win 14600, options [mss 1460,sackOK,TS val 1744488483 ecr 0,nop,wscale 5], length 0

EDIT: Similar Solution

After kupson's answer I've found a very similar solution here:

http://idallen.com/dnat.txt (search for "Many clients - too much SNAT"). It has:

iptables -t nat -A POSTROUTING -s 172.16.0.0/24 -d 172.16.0.0/24 -m conntrack --ctstate DNAT  -j SNAT --to 172.16.0.254
Mate
  • 23
  • 3

1 Answers1

1

One possible solution is to use SNAT on the HOST to change the source address of packets and forward them back to "ari" VM. It's not the most performant solution but it's simple and good enough for many setups.

# fixup chain
iptables -t nat -N fixup-snat
iptables -t nat -A fixup-snat -m conntrack --ctstate DNAT -j MASQUERADE

# please select proper network ranges and NIC names below
iptables -t nat -I POSTROUTING -s 192.168.63.0/24 -d 192.168.63.0/24 -o virbr63 -j fixup-snat

You can merge this into single iptables rule, I prefer separate chain for clarity.

You should also disable processing packets by iptables on the bridge interface:

sysctl -w net.bridge.bridge-nf-call-arptables=0
sysctl -w net.bridge.bridge-nf-call-ip6tables=0
sysctl -w net.bridge.bridge-nf-call-iptables=0

To make those survive reboot on Debian systems please edit the /etc/sysctl.conf file or create new file in /etc/sysctl.d/ directory.

kupson
  • 3,388
  • 18
  • 18
  • 1
    If I understand correctly, then because of the "-s 192.168.63.0/24" it will only MASQUERADE for requests originating from 192.168.63.0/24 and the traffic originating from outside (Internet) will hit ari as before (this is good). As it works from other machines on 192.168.63.0/24 , I can probably change it to "-s 192.168.63.2". Right? Thanks – Mate May 21 '18 at 16:27
  • 1
    Yes you can. It's just my preference to setup it on whole network. Please note that there is additional requirement there -- packets needs to be DNAT-ed (--ctstate DNAT) so it won't trigger on normal traffic from HOST to "ari". – kupson May 21 '18 at 16:34
  • 1
    Thanks, @kupson. I've tried it. Interestingly, it only works when tcpdump is running on the host! I've added the details to the original question. – Mate May 21 '18 at 19:42
  • 1
    What's the output of `sysctl -a | grep bridge-nf` command? – kupson May 22 '18 at 09:47
  • 1
    ari:empty. Host: `net.bridge.bridge-nf-call-arptables = 1` `net.bridge.bridge-nf-call-ip6tables = 1` `net.bridge.bridge-nf-call-iptables = 1` `net.bridge.bridge-nf-filter-pppoe-tagged = 0` `net.bridge.bridge-nf-filter-vlan-tagged = 0` `net.bridge.bridge-nf-pass-vlan-input-dev = 0` – Mate May 22 '18 at 20:13
  • 1
    Please try to disable `bridge-nf-call-*` - processing packets by iptables rules twice (+1 on the bridge interface) can be tricky: `sysctl -w net.bridge.bridge-nf-call-arptables=0` `sysctl -w net.bridge.bridge-nf-call-ip6tables=0` `sysctl -w net.bridge.bridge-nf-call-iptables=0` – kupson May 24 '18 at 12:16
  • 1
    Yes, doing the 3 sysctl settings solves the problem. When I tcpdump it, there are no duplicate packets, and it works fine without a tcpdump. I clean up my question and accept your answer. Can you please edit your answer and add the 3 sysctls? Thank you very much, @kupson! – Mate May 26 '18 at 19:14