4

As I understand IPv6 fragmentation, the routers do not perform fragmentation, only the end-to-end nodes do. And when any router along the path receives a packet larger than the MTU of the link to its next hop, it'll discard it and reply to the source IP with an ICMPv6 "Packet too big".

Following is what I observe in my setup:

Initially after my local ethernet link is up, I visit an HTTP page with a request that causes a large (1965 bytes) packet to be sent out. My router replies with ICMPv6 saying the packet's too big and my MTU is 1492 (that of the ADSL ATM link). My machine then splits the TCP packet into two each smaller ones (1492 and 545 bytes) and tries again (instead of adding extension headers to IPv6 for fragmentation which is what I expected should happen).

So far so good. What puzzles me is that from then on, the router no longer sends back "Packet too big" responses despite some outgoing packets being larger than 2K (e.g. 2399 bytes) in size and everything seems to be fine (ie no retransmission using smaller packets).

Any idea what's happening here?

I'm on Linux 3.14.23 and my router's Tomato based. I don't have packet monitoring on my router at the moment.

Mansour
  • 499
  • 2
  • 7
  • 14
  • 1
    A broken router? Tomato is old and unmaintained, you know. – Michael Hampton Jan 04 '15 at 15:08
  • It's actually Shibby's fork, and I updated it last 3 weeks ago. However, even if it was old and unmaintained, I'd expect less from it. Right now, it's doing magic (obviously I'm missing something), passing packets larger than its MTU! – Mansour Jan 04 '15 at 15:10
  • It may be useful to include the exact size of the packet which triggered the too big response as well as the size of the later large packets which did not. – kasperd Jan 04 '15 at 16:04
  • I embedded the exact sizes in the text. – Mansour Jan 04 '15 at 16:12
  • 1
    Is the 2399 byte packet part of the same TCP connection for which you received the too big message? The output of `ethtool --show-offload eth0` on the client may also provide some hints. (Replace eth0 with the name of the interface on which the client is actually sending the 2399 byte packet.) – kasperd Jan 04 '15 at 16:39
  • It was the offloading feature at work - see my comment on your answer. – Mansour Jan 04 '15 at 16:40

2 Answers2

4

Having layer between IPv6 and the physical layer do hop-by-hop fragmentation is permitted by the standard. And in fact if the MTU of the physical layer is smaller than 1280 bytes, such hop-by-hop fragmentation is even mandatory. The exact workings of such a fragmentation below the IPv6 layer is outside of the scope of the IPv6 standard. The exact wording in RFC 2460 is this:

On any link that cannot convey a 1280-octet packet in one piece, link-specific fragmentation and reassembly must be provided at a layer below IPv6.

The fragmentation you have in mind is the end-to-end fragmentation in IPv6. And that kind of fragmentation can only be performed by the node originating the packet in the first place. No intermediate router is allowed to perform this kind of fragmentation on a packet which they are forwarding.

As far as I can tell from your question, neither kind of fragmentation is happening in your case.

If you were able to send a 2KB packet from the HTTP client to the router in the first place, that would imply that your LAN was configured to use jumboframes. Another possibility is that the HTTP client is running on a host with support for TCP segmentation offloading. If that was the case, the first packet may appear to be 2KB when observed with tcpdump on the sending host, but on the wire it might actually be one packet with the first 1500 bytes and another packet with the rest.

The 1500 bytes would still be too large for the MTU on the ADSL link. What the actual size of the packet triggering the too big error message was can be seen on the client machine by inspecting the error message with an appropriate tool like for example Wireshark.

What happens once the client's TCP stack receives the too big error depends on which TCP stack is being used. Some will retransmit the same TCP segment using IPv6 fragmentation, others will split the TCP segment into two smaller TCP segments. The IPv6 standard says this:

In order to send a packet larger than a path's MTU, a node may use the IPv6 Fragment header to fragment the packet at the source and have it reassembled at the destination(s). However, the use of such fragmentation is discouraged in any application that is able to adjust its packets to fit the measured path MTU (i.e., down to 1280 octets).

I read this as recommending that the TCP segment is retransmitted as two smaller segments rather than using IPv6 fragmentation. There are multiple reasons for TCP segmentation being preferred over fragmentation.

The router may rate limit the too big error messages. So if you send multiple TCP segments each of which are larger than 2KB, you might only receive an error message for the first. The TCP stack should be able to deal with this by using the smaller MTU once it retransmits the packets which exceeded the MTU the first time around.

What you are seeing might simply be a rate limit which is lower than you were expecting. You can try to measure what rate limit is actually being used and then only take further action if you find it to be unreasonably low.

kasperd
  • 29,894
  • 16
  • 72
  • 122
  • Superb detail! The use of smaller TCP segments over IP frag is now clear. I also had the MTU wrong when I wrote the question, it's actually 1492. So if I understand correctly, the router's free to do its own frag at link layer, and that's fine. But if that's the case, why do I get "Packet too big" messages in the first place? It's also the case that the subsequent >2K packets actually go through without a problem (ie they're not retransmitted as two smaller ones). – Mansour Jan 04 '15 at 15:57
  • @Mansour I did notice your edit, but no part of my answer actually depends on whether the MTU on that link is 1280, 1492, or anything in between. I don't know for sure if link layer fragmentation is happening in your case, but I don't think it is. – kasperd Jan 04 '15 at 16:03
  • If it's not link layer, and the sender doesn't retransmit smaller segments, then where is the frag happening? maybe over the PPP? (I need to find a way to monitor packet traffic on my router). – Mansour Jan 04 '15 at 16:08
  • YES! Your edit about TCP Segmentation Offloading (TSO) is spot on. I disabled TSO on my ethernet using `ethtool -K eth0 tso off` and no packets are sent that are larger than 1492. So as you said, wireshark wasn't showing the whole picture. – Mansour Jan 04 '15 at 16:37
0

Once initial fragmentation size has been identified. It should stick to the link. The initial packet will fail initially. Further packets will be fragmented in the IPv6 stack before being sent. Dump the traffic at the client and check the packet sizes. You should see the larger packets (responses) being fragmented before transmission.

You indicate that everything works after the first fragmentation. This indicates that the packets are being fragmented appropriately, as otherwise they would likely fail with fragmentation further down the route.

BillThor
  • 27,354
  • 3
  • 35
  • 69