Can a home PPPoE connection access all websites, regardless of the site infrastructure's MTU settings?

1

I'm just trying to get my head round MTU, MRU and MSS.

My interest initially came from the answer in this post: Security risk of PING?:

Some ICMP packet types MUST NOT be blocked, in particular the "destination unreachable" ICMP message, because blocking that one breaks path MTU discovery, symptoms being that DSL users (behind a PPPoE layer which restricts MTU to 1492 bytes) cannot access Web sites which block those packets (unless they use the Web proxy provided by their ISP).

I've since found this article which backs this up:

Some people running web servers (notably some banks) set up their network so that they block the error message that is sent back when a packet is too big. This would not be too bad if they did not also try and send 1,500 byte packets with the DF bit set. The result is the packet gets dropped when it hits a sub 1,500 MTU link and has to retry. Eventually it may try a smaller packet size but this could be 20 seconds later. This is a stupid network setup on the part of the person running the web server.

My question is: is this actually a real problem? As far as I know, I have never seen this happen on my BT Infinity connection that uses PPPoE. Presumably this has the same restrictions as mentioned above (my router MTU is set at 1492).

Could it be my router silently employing MSS clamping?

SilverlightFox

Posted 2014-12-23T14:13:03.470

Reputation: 140

How old are these articles? This was a pretty hot issue back in late 90s, early 00s; not so much today. – Nevin Williams – 2014-12-23T16:41:44.523

@NevinWilliams: The Sec.SE post is from 2011 and the one mentioning the bank example is 2009. I'd be interested to know why it is not an issue today? – SilverlightFox – 2014-12-23T16:53:03.837

Not so much an issue today.

Corporate policies, common practices, and even IP stacks do eventually adapt, although often at glacial speed, to rectify these sorts of issues. Remedies to a particularly widespread issue can come from one or several levels: kernel defaults, software package defaults, configuration templates, ISP intervention, router software features, alternative protocols, reduced reliance of deprecating services. – Nevin Williams – 2014-12-25T11:42:56.493

Answers

5

MTU stands for maximum transmission unit, ie. the IP datagram size limit (in bytes). Default and maximum MTU allowed by Ethernet is 1500.

Let's imagine we have a network like the one below. C is a client; S is a server; X and Y are routers.

 ___          ___          ___          ___
| C |        | X |        | Y |        | S |
|___|========|___|--------|___|========|___|

There are four networks between C and S. Three of them have max MTU of 1500 and one has lower MTU of 1200 (just an example). The low MTU network is marked with dashes.

C tries Path MTU Discovery on the path to S. It sends an IP datagram with 20 B header and 1480 B payload, 1500 B in total. The Don't fragment (DF) flag is set in IP header.

The datagram reaches X. X tries to pass it further to Y, but Y responds with Fragmentation needed ICMP message because its MTU is too low and the DF flag is set. C receives that message and learns that path MTU is lower than 1500. It then tries again with smaller payload, each time receiving Fragmentation needed, until payload size reaches 1180 B. 1180 + 20 = 1200, so the datagram finally reaches S successfully and path MTU is discovered.

PMTUD works only if Y replies with Fragmentation needed ICMP messages. Otherwise C won't know that the datagram was dropped.

Your router sends proper ICMP messages, everything works as intended, there's no reason for Internet to be broken for you because of lower MTU.


What happens if PMTUD doesn't work? (eg. because of ICMP blocked in either direction)

Neither end of the connection actually has to know path MTU. IP protocol can handle that. It may be sub-optimal, but it will work.

IP is capable of transmitting a payload of any size, no matter what the path MTU is. This property is enforced by the OSI model: IP works in layer 3. Layer 4 shouldn't have to care about the underlying protocol, so no size limit can be placed on the payload.

Basic IP header is 20 bytes long. Included in this header are two interesting flags: Don't fragment (DF) and More fragments (MF), and also the Fragment offset field (FO). I have already mentioned 20 B header size and the DF flag before. (I'm talking about IPv4 header, IPv6 is different)

IP is capable of splitting a big payload into fragments and reassembling it at the destination.

Let's say we want to transmit a 5000 B payload from C to D which are both in the same network (ie. connected through a switch or hub). MTU of C and D's NICs is 1500. Each fragment's header is 20 B long, so max data size for a single datagram is 1480 B (1500 - 20). The payload will be sent in 4 datagrams: (MF - More fragments, FO - Fragment offset)

  1. Bytes 1-1480, MF: 1, FO: 0
  2. Bytes 1481-2560, MF: 1, FO: 1480
  3. Bytes 2561-4440, MF: 1, FO: 2560
  4. Bytes 4441-5000, MF: 0, FO: 4440

The DF flag doesn't matter in this case. MF is 0 for the last fragment, 1 otherwise. FO is the offset of the first byte in a fragment (offsets are indexed starting with 0). These datagrams will be automatically reassembled on target NIC.

Now C wants to send a 5000 B payload to S. Let's assume it magically knows path MTU (or C's NIC is configured with MTU=1200, so C sends 1200 B datagrams). It will fragment the payload like this:

  1. Bytes 1-1180, MF: 1, FO: 0
  2. Bytes 1181-2360, MF: 1, FO: 1180
  3. Bytes 2361-3540, MF: 1, FO: 2360
  4. Bytes 3541-4720, MF: 1, FO: 3540
  5. Bytes 4721-5000, MF: 0, FO: 4720

This is the most optimal fragmentation of the payload.

If C doesn't know and cannot determine path MTU, it must rely on intermediate nodes to fragment the payload correctly. C has MTU=1500, so it sends 4 datagrams as shown in the C→D example above. However, those datagrams will have to be fragmented again to be transmitted through the X—Y connection. Each of the datagrams received by Y has to be at most 1200 B long, so 1500 B-long datagrams will be split into two: 1200 B and 320 B (20 B extra for second header). This fragmentation results in 7 datagrams (and thus 7 headers) being transmitted from X to S instead of optimal 5:

  1. Bytes 1-1180, MF: 1, FO: 0
  2. Bytes 1181-1480, MF: 1, FO: 1180
  3. Bytes 1481-2260, MF: 1, FO: 1480
  4. Bytes 2261-2560, MF: 1, FO: 2260
  5. Bytes 2561-4140, MF: 1, FO: 2560
  6. Bytes 4141-4440, MF: 1, FO: 4140
  7. Bytes 4441-5000, MF: 0, FO: 4440

Note that this time fragments aren't equal. Datagrams aren't recombined and fragmented again optimally in intermediate nodes, only fragmentation is performed.

In practice intermediate routers may be configured to deny performing fragmentation themselves and require transmission endpoints to use optimal MTU, so intermediate node fragmentation shouldn't be relied upon. PMTUD is preferred.

gronostaj

Posted 2014-12-23T14:13:03.470

Reputation: 33 047

Thanks. In your example does S need to try PMTUD for the reply to C? What if the firewall (in the bank example) is set not to allow any type of ICMP packet either way (say this is between X and Y)? I guess it would it break things, however I don't tend to hear of this type of problem very often. – SilverlightFox – 2014-12-23T15:01:40.037

@SilverlightFox If PMTUD fails in either direction, sender can still try to rely on intermediate routers to perform fragmentation. Please see my edit for details. – gronostaj – 2014-12-23T21:32:40.283

that looks very impressive, can you provide any sources for learning that stuff? – barlop – 2014-12-23T21:38:03.203

@barlop I can recommend The TCP/IP Guide, it's available online for free and covers a lot of networking-related stuff.

– gronostaj – 2014-12-23T21:56:59.893

@gronostaj: Cheers - my question seems to have been closed. I've now edited it. Any chance you could vote to reopen it if you think its now on topic? – SilverlightFox – 2015-01-12T10:52:13.083