Why some iptables DNAT rules don't work until reboot?

Question

My iptables DNAT rules don't work until reboot. If I reboot my server, all of the rules work.

Desciption of the architecture :

Tens of hosts (senders) send some UDP packets (one-way on a specific port 9999) to my Linux router. This Linux router use iptables to forward those packets to several hosts (receivers).

senderX 10.0.0.X ====> Linux router with iptables ====> receiverY 10.0.1.Y

The linux router have two network cards eth1 10.0.0.1/24 (senders side) and eth0 10.0.1.1/24 (receivers side).

Iptables setup :

ip_forwarding is activated
all of the default policies are set to ACCEPT
one iptables rules exist per sender, here is an example :

iptables -t nat -A PREROUTING -s 10.0.0.2 -i eth1 -j DNAT --to-destination 10.0.1.123

Network setup :

ip addr show :

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 54:9f:35:0a:16:38 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.1/24 brd 10.0.1.255 scope global eth0
    inet6 fe80::569f:35ff:fe0a:1638/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 54:9f:35:0a:16:3a brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/24 brd 10.0.0.255 scope global eth1
    inet6 fe80::569f:35ff:fe0a:163a/64 scope link
       valid_lft forever preferred_lft forever

Symptom :

After adding a set of rules, some of the rules doesn't work. And I can see with tcpdump that UDP packets are no more routed and packets are rejected.

tcpdump -n -i eth1 host 10.0.0.2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
16:12:58.241225 IP 10.0.0.2.56859 > 10.0.0.1.9999: UDP, length 1464
16:12:58.241285 IP 10.0.0.1 > 10.0.0.2: ICMP 10.0.0.1 udp port 9999 unreachable, length 556

If I flush all the rules and reinject them in iptables, rules that was not working still not work.
If I reboot my server all the rules work fine.

Analysis done :

I have added a rule to log a specific sender which is not working :

iptables -t nat -A PREROUTING -s 10.0.0.2 -i eth1 -j LOG --log-prefix='PREROUTING LOG :'

But this rule doesn't log anything. Packets are coming because I see those in tcpdump but they are not logged. Also with the -v option in iptables, I don't see counters increasing for this rule.

If I apply the same rule before it stop working, I have some logs.

Question :

Is there any limits on UDP forwarding in iptables ?
How I can troubleshoot this issue ?

Can you confirm, by a simultaneous `tcpdump` on *both* interfaces, that those packets really aren't making it through the router? — MadHatter, Mar 05 '15 at 09:35
@MadHatter yes, I confirm it. I ve just done another test which is the opposite. I have dropped a working dnat rule -> the rule is no more in `iptables -L -t nat` -> but the redirection is still working (confirmed by tcpdump). I can still see the entry in conntrack : `cat /proc/net/nf_conntrack | grep 10.0.0.2`. `ipv4 2 udp 17 28 src=10.0.0.2 dst=10.0.0.1 sport=64149 dport=9999 [UNREPLIED] src=10.0.1.123 dst=10.0.0.2 sport=9999 dport=64149 mark=0 zone=0 use=2` — kranteg, Mar 05 '15 at 13:36
Can you include the complete output of `iptables -L -n -v; iptables -t nat -L -n -v; ip addr show` in your question? — MadHatter, Mar 05 '15 at 14:05
I have about 4000 rules so export will be big. I can grep on one of the faulty ip. — kranteg, Mar 05 '15 at 16:29
I doubt we will be able to make any sense of that, and it may be the core of the problem. Why on earth do you have 4000 rules on an internal firewall, if you don't mind me asking? I've run `iptables` firewalls for big companies with complex network setups, and never needed more than a couple of hundred. — MadHatter, Mar 05 '15 at 16:37
Senders exports data on this router (one unique ip to manage) and the router route on a specific receiver for data processing. Regarding the trouble, it's like if my changes in iptables are only applied on reboot (but I know iptables doesn't work like this). It seems close to this http://serverfault.com/questions/673000/iptables-keeps-using-old-nat-rules to me. — kranteg, Mar 05 '15 at 16:52
Can we at least see the `ip addr show`? I'd like to confirm that the firewall possesses the address as an interface alias. — MadHatter, Mar 06 '15 at 07:14
@MadHatter I have added `ip addr show` information. Next time I have the trouble on my router, I will post `iptables -L -n -v; iptables -t nat -L -n -v` and `tcpdump` information with only one rule on the router. — kranteg, Mar 06 '15 at 15:51
I'm quite surprised to see `10.0.0.2` not showing up on `eth1`; could you add it as an address? My experience of DNAT to addresses that are not owned by an interface has been quite hit-and-miss as well - sometimes it works, and sometimes it doesn't. — MadHatter, Mar 06 '15 at 16:53
@MadHatter Why would you expect `10.0.0.2` on `eth1`, ip of the router on `eth1` is `10.0.0.1`. `10.0.0.2` is a remote host on `eth1` side. As the answer posted under suggest, it's related to `conntrack`, check my comment. — kranteg, Mar 09 '15 at 17:18
My bad, sorry, I misread `-s 10.0.0.2` as `-d 10.0.0.2`. Glad you found an answer. — MadHatter, Mar 09 '15 at 18:52

score 5 · Accepted Answer · answered Mar 06 '15 at 16:31

5

The symptoms you describe match those seen when there is a conflict between a NAT rule and a connection tracking entry.

For example when a packet is matched by

-A PREROUTING -s 10.0.0.2 -i eth1 -j DNAT --to-destination 10.0.1.123

a new connection tracking entry need to be created. This will map a tuple of source and destination IP and port on the incoming side to a similar tuple on the outgoing side.

There cannot be an existing connection tracking entry matching the incoming side, because if there was it would have been used instead of the rule. However once the destination IP of the tuple has been replaced to construct the tuple for the outgoing side, the tuple may conflict with an existing connection tracking entry.

If you install the conntrack utility, you can type conntrack -L to see a list of existing connection tracking entries. That utility also has features to list only connection tracking entries matching specific criteria as well as remove selected entries.

If this is indeed the problem you are facing, then removing the offending connection tracking entry will make the problem go away. A permanent fix usually involves configuring relevant NAT rules for packets in both directions, such that you always get the desired connection tracking entry, even if the first packet happens to be send in the opposite direction than is usually the case.

answered Mar 06 '15 at 16:31

kasperd

29,894
16
72
122

1

has you expect, if I use `conntrack -D -s 10.0.0.2` to drop the conntrack entry, my iptables rule work again. But I don't understand the issue and I can not add a NAT rule in the other direction because it's a one way UDP flow. – kranteg Mar 09 '15 at 17:21
@kranteg If you are only sending packets in one direction, you probably have a flaw in your protocol design. For example it is very prone to accidentally flooding an unintended target with packets. Connection tracking of UDP packets hardly make sense, if you are only sending packets in one direction, since the primary purpose of connection tracking is to ensure the packets in the other direction make it to the proper destination. But iptables doesn't support NAT without connection tracking. – kasperd Mar 09 '15 at 17:34
@kranteg But actually I doubt the communication really is one way only. What I think is happening is that some packets are coming in the other direction, and they don't look the way you expect. A packet trace showing the packet which broke the communication would help, or a list of the connection tracking entries. Without those, I cannot give you a more detailed answer, than I already have. – kasperd Mar 09 '15 at 17:37
thanks to you. Due to the protocol design, there is no pause during flows emission and port source is always the same so the conntrack entry never time out. Solution for me is to drop the conntrack entry after the iptables rule creation. – kranteg Mar 10 '15 at 13:58