1

The diagram below depicts a scenario that involves aggregation of three slow channel throughputs over a WAN.
A fast host on a WAN (@ 54.239.98.8) is communicating with a host on a LAN (@ 192.168.0.100) which is connected via three slow channels to the WAN through three routers running Linux v4.14.151 and netfilter/iptables firewalls:

enter image description here

The IP traffic from the fast host arrives fragmented and randomized at the three routers (but always from 54.239.98.8). I have no control over this fragmentation (corporate politics, go figure) - I suspect the fragmentation is done on purpose by the fast host.

THE PROBLEM: Each router attempts to reassemble the fragmented IP packets which leads to data loss because the fragments take random paths through the three routers and often one router cannot collect all of the packet fragments for successful reassembly.

Upon analyzing the iptables / netfilter diagram below, I can see that the offending reassembly is occurring in the PREROUTING netfilter hook, before the rule chains in the raw table are processed.

enter image description here

THE ATTEMPTED SOLUTION: I have modified the kernel module nf_defrag_ipv4 to disable the offending defragmentation in the PREROUTING hook as follows:

static const struct nf_hook_ops ipv4_defrag_ops[] = {
    {
        .hook       = ipv4_conntrack_defrag, /* I changed this to point to: return NF_ACCEPT; */
        .pf         = NFPROTO_IPV4,
        .hooknum    = NF_INET_PRE_ROUTING,
        .priority   = NF_IP_PRI_CONNTRACK_DEFRAG,
    },
    {
        .hook       = ipv4_conntrack_defrag,
        .pf         = NFPROTO_IPV4,
        .hooknum    = NF_INET_LOCAL_OUT,
        .priority   = NF_IP_PRI_CONNTRACK_DEFRAG,
    },
};

The complete source code of this module can be viewed here.

This code alteration disables the reassembly of all incoming packets and allows the unaltered IP fragments to pass to the destination host on the LAN (@ 192.168.0.100) which accomplishes its own packet reassembly with packets coming from all three routers. This solution works, but it is ugly, since it modifies the kernel code and disables defragmentation for ALL forwarded packets (without regard to their source).

THE QUESTION: Is there a better solution then making this code change in the kernel ?
Especially a way to selectively disable IP defragmentation only for packets coming from the fast host on the WAN @ ip.src == 54.239.98.8.

  • 1
    Routers are _not_ supposed to try to reassemble fragments. What you have is backwards. Routers fragment packets that will not fit the next MTU, but only the destination host is supposed to try to reassemble the fragments into the original packet. Fragmentation and reassembly are resource intensive, so routers do not try to reassemble packets. RFC 791, Internet Protocol: "_The basic internet service is datagram oriented and provides for the fragmentation of datagrams at gateways, with reassembly taking place at the destination internet protocol module in the destination host._" – Ron Maupin Dec 07 '19 at 13:04
  • Also, most companies will block packet fragments at the firewall because of fragmentation attacks. Allowing fragments from outside your network can invite such attacks. The modern solution is to use PMTUD to adjust the MTU of the source host to match the smallest MTU of the path so that fragmentation is unnecessary. – Ron Maupin Dec 07 '19 at 13:12
  • 1
    @RonMaupin Indeed, routers are not supposed to reassemble fragments, but here it's a firewall and connection tracking needs to get whole packets. But since the traffic is splitted on three links, I doubt any usefull connection tracking can be performed. – JeanPierre Dec 07 '19 at 16:39
  • @JeanPierre Me thinks the conntrack still could do full OSI L3 tracking on IP fragmented packets because it can identify and associate these IP fragments by `ip.src`, `ip.dst`, `ip.proto`, `ip.ID`. Of course L4 tracking would be nearly impossible without reassembly or some store-and-forward-later scheme. – George Robinson Dec 08 '19 at 21:19
  • @Ron Maupin I agree that routers are not supposed to reassemble packets. Are the rules any different for firewalling routers because of their need for L4 connection tracking? – George Robinson Dec 08 '19 at 21:20
  • @Ron Maupin The "modern solution" of using PMTUD to avoid fragmentation is irrelevant in this scenario because the **fast host** (@ 54.239.98.8) fragments the IP traffic on purpose so it can scatter it across the 3 slow links...which are aggregated later. – George Robinson Dec 08 '19 at 21:37
  • 1
    That is not the way to use fragmentation. In fact, as I suggested above, you should disallow fragments at the firewall. Fragmentation and reassembly end up slowing the traffic quite a bit, anyway. Fragmentation and reassembly are resource intensive tasks that will slow traffic a lot, and splitting traffic on the same flow can cause lost and out-of-order packet delivery that can cause TCP to drop a lot of speed. If this is done on purpose, it is done by someone who does not really understand what is happening. – Ron Maupin Dec 08 '19 at 22:08

0 Answers0