Odd packet loss through NAT with two WAN interfaces

1

1

I have got a Linux server machine which I also use as network gateway / "router". It has three network interfaces active – two network interfaces connected to the internet over different ISPs and the third one providing internet access to my local machines through NAT. I have load balancing between the WAN links.

From the server, the network is accessible just fine – everything works, load balancing works and generally no packet losses. Connections between the server and the local machines works totally fine, too. But if I access the internet / WAN from a local machine through the server, I always see constant packet loss of ~40%. This makes connections very unstable. With a bit of investigating, I could see that I receive (and lose) packets coming through both of the interfaces more or less equally, so it's not like one of the interfaces would be dragging everything else down by losing all of its packets.

If I disable either of the two WAN links, this packet loss instantly disappears. It instantly reappears if I enable both WAN links again.

What could cause this? Any hints how to troubleshoot this problem without having to give up one of the WAN links?

my iptables filter table:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination     

my iptables nat table:

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
MASQUERADE  all  --  10.42.0.0/24        !10.42.0.0/24 

my iptables mangle table:

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            mark match ! 0x0
MARK       all  --  0.0.0.0/0            0.0.0.0/0            state NEW MARK set 0x2
MARK       all  --  0.0.0.0/0            0.0.0.0/0            state NEW statistic mode random probability 0.50000000000 MARK set 0x1

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination    

ip route show output:

default 
    nexthop via 10.7.0.254  dev eth0 weight 1
    nexthop via 78.62.255.254  dev eth2 weight 1
10.7.0.0/16 dev eth0  proto kernel  scope link  src 10.7.5.102 
10.42.0.0/24 dev eth1  proto kernel  scope link  src 10.42.0.254 
78.62.192.0/18 dev eth2  proto kernel  scope link  src 78.62.239.10 
169.254.0.0/16 dev eth1  scope link  metric 1000 

everything's unedited as-is – don't care about "privacy" much in this case

librin.so.1

Posted 2014-03-21T18:49:23.200

Reputation: 111

How do you have the load balancing set up? Is the NAT smart enough to map all of a given client's traffic through a single WAN link? – Spiff – 2014-03-21T19:01:31.390

@Spiff it splits all the traffic across both WAN links. But that shouldn't be a problem, though. – librin.so.1 – 2014-03-21T19:05:04.510

So your NAT has two public IP addresses, one on WAN link A for ISP A and one on WAN link B for ISP B. So when a LAN-side client starts a TCP connection to a public server, your NAT is smart enough to keep all of that TCP connection's packets routed out the same WAN link as the first TCP-Syn in that connection, correct? – Spiff – 2014-03-21T19:17:20.477

@Spiff if You read my post (I hope You did), it is stated that no, it distributes the traffic from one connection across both interfaces. But as stated in the post, that in itself is not a problem as it otherwise works just fine and traffic split between two WAN links like this goes between an arbitrary remote host and the server itself with no problems whatsoever. So the problem has to be in the server itself. – librin.so.1 – 2014-03-21T19:23:03.820

I'm trying to tell you that a single TCP connection must always go out the same WAN link, because its outgoing packets must always have the same IP source address and port. I can imagine ways of enabling load balancing on a host with multiple WAN connections in such a way that its own local TCP/IP stack is smart enough to keep a single TCP connection on a single WAN interface, but a NAT engine running on the same box, if not aware of the WAN-link load balancing, might do the wrong thing with the packets it translates/forwards on behalf of the private LAN clients. Show us your iptables config. – Spiff – 2014-03-21T20:07:47.823

@as I said, when having a single connection from the server itself, it gets split over both interfaces and that works fine. When I single connection comes from the LAN it also gets split across both interfaces – the data gets sent from and gets replied to both interfaces at the same time. But the packets are lost after reaching the server itself. What reaches a LAN host is a more or less even mixture of packets that went got back through both interfaces. And the packets sent from and are replied to the interface that initiated the connection are getting lost just as much. iptables in a sec. – librin.so.1 – 2014-03-21T20:25:13.827

You've only posted your "filter" table. Figuring this out will probably require seeing your "nat" and "mangle" tables as well, and probably your iproute2 ("ip route" and "ip rule") setup. Feel free to anonymize it somewhat, as long as it doesn't become ambiguous. – Spiff – 2014-03-21T21:22:44.040

that's all what I get with iptables --list. I shall also post my file for iptables-restore, then. Everything else in a min. – librin.so.1 – 2014-03-21T21:47:29.993

By default, -L (--list) lists the default table, which is the "filter" table. To see the other tables, you have to ask for them by name like this: iptables -t nat -nL, iptables -t mangle -nL. – Spiff – 2014-03-21T22:00:03.470

@Spiff I added all the tables for iptables now – librin.so.1 – 2014-03-25T17:11:50.547

Answers

0

Based on the tables you've shown, you're not doing anything to make sure that the NAT keeps flows going out the same interface they started on, which means that roughly half of your outgoing packets are probably getting mistranslated.

In order to do NAT load balancing right, you need a prerouting rule on the mangle table randomly marking new flows with either a 1 or a 2, you need ip rule rules routing packets marked 1 to WAN interface 1, and packets marked 2 to WAN interface 2, and you need separate SNAT rules on the iptables NAT table, one for each WAN interface.

For a more detailed description, see Diego Lima's Iptables Load Balancing in a Nutshell

Spiff

Posted 2014-03-21T18:49:23.200

Reputation: 84 656

I experimented with this for quite a while, but no matter what it either just had no effect on the amount of packets dropped or even made it worse. Thus, sorry, but I just can't accept this answer. Although, I am very grateful for all the help attempts. BTW, I updated the question post. – librin.so.1 – 2014-04-10T16:32:01.720