MTU stands for maximum transmission unit, ie. the IP datagram size limit (in bytes). Default and maximum MTU allowed by Ethernet is 1500.
Let's imagine we have a network like the one below. C is a client; S is a server; X and Y are routers.
___ ___ ___ ___
| C | | X | | Y | | S |
|___|========|___|--------|___|========|___|
There are four networks between C and S. Three of them have max MTU of 1500 and one has lower MTU of 1200 (just an example). The low MTU network is marked with dashes.
C tries Path MTU Discovery on the path to S. It sends an IP datagram with 20 B header and 1480 B payload, 1500 B in total. The Don't fragment (DF) flag is set in IP header.
The datagram reaches X. X tries to pass it further to Y, but Y responds with Fragmentation needed ICMP message because its MTU is too low and the DF flag is set. C receives that message and learns that path MTU is lower than 1500. It then tries again with smaller payload, each time receiving Fragmentation needed, until payload size reaches 1180 B. 1180 + 20 = 1200, so the datagram finally reaches S successfully and path MTU is discovered.
PMTUD works only if Y replies with Fragmentation needed ICMP messages. Otherwise C won't know that the datagram was dropped.
Your router sends proper ICMP messages, everything works as intended, there's no reason for Internet to be broken for you because of lower MTU.
What happens if PMTUD doesn't work? (eg. because of ICMP blocked in either direction)
Neither end of the connection actually has to know path MTU. IP protocol can handle that. It may be sub-optimal, but it will work.
IP is capable of transmitting a payload of any size, no matter what the path MTU is. This property is enforced by the OSI model: IP works in layer 3. Layer 4 shouldn't have to care about the underlying protocol, so no size limit can be placed on the payload.
Basic IP header is 20 bytes long. Included in this header are two interesting flags: Don't fragment (DF) and More fragments (MF), and also the Fragment offset field (FO). I have already mentioned 20 B header size and the DF flag before. (I'm talking about IPv4 header, IPv6 is different)
IP is capable of splitting a big payload into fragments and reassembling it at the destination.
Let's say we want to transmit a 5000 B payload from C to D which are both in the same network (ie. connected through a switch or hub). MTU of C and D's NICs is 1500. Each fragment's header is 20 B long, so max data size for a single datagram is 1480 B (1500 - 20). The payload will be sent in 4 datagrams: (MF - More fragments, FO - Fragment offset)
- Bytes 1-1480, MF: 1, FO: 0
- Bytes 1481-2560, MF: 1, FO: 1480
- Bytes 2561-4440, MF: 1, FO: 2560
- Bytes 4441-5000, MF: 0, FO: 4440
The DF flag doesn't matter in this case. MF is 0 for the last fragment, 1 otherwise. FO is the offset of the first byte in a fragment (offsets are indexed starting with 0). These datagrams will be automatically reassembled on target NIC.
Now C wants to send a 5000 B payload to S. Let's assume it magically knows path MTU (or C's NIC is configured with MTU=1200, so C sends 1200 B datagrams). It will fragment the payload like this:
- Bytes 1-1180, MF: 1, FO: 0
- Bytes 1181-2360, MF: 1, FO: 1180
- Bytes 2361-3540, MF: 1, FO: 2360
- Bytes 3541-4720, MF: 1, FO: 3540
- Bytes 4721-5000, MF: 0, FO: 4720
This is the most optimal fragmentation of the payload.
If C doesn't know and cannot determine path MTU, it must rely on intermediate nodes to fragment the payload correctly. C has MTU=1500, so it sends 4 datagrams as shown in the C→D example above. However, those datagrams will have to be fragmented again to be transmitted through the X—Y connection. Each of the datagrams received by Y has to be at most 1200 B long, so 1500 B-long datagrams will be split into two: 1200 B and 320 B (20 B extra for second header). This fragmentation results in 7 datagrams (and thus 7 headers) being transmitted from X to S instead of optimal 5:
- Bytes 1-1180, MF: 1, FO: 0
- Bytes 1181-1480, MF: 1, FO: 1180
- Bytes 1481-2260, MF: 1, FO: 1480
- Bytes 2261-2560, MF: 1, FO: 2260
- Bytes 2561-4140, MF: 1, FO: 2560
- Bytes 4141-4440, MF: 1, FO: 4140
- Bytes 4441-5000, MF: 0, FO: 4440
Note that this time fragments aren't equal. Datagrams aren't recombined and fragmented again optimally in intermediate nodes, only fragmentation is performed.
In practice intermediate routers may be configured to deny performing fragmentation themselves and require transmission endpoints to use optimal MTU, so intermediate node fragmentation shouldn't be relied upon. PMTUD is preferred.
How old are these articles? This was a pretty hot issue back in late 90s, early 00s; not so much today. – Nevin Williams – 2014-12-23T16:41:44.523
@NevinWilliams: The Sec.SE post is from 2011 and the one mentioning the bank example is 2009. I'd be interested to know why it is not an issue today? – SilverlightFox – 2014-12-23T16:53:03.837
Not so much an issue today.
Corporate policies, common practices, and even IP stacks do eventually adapt, although often at glacial speed, to rectify these sorts of issues. Remedies to a particularly widespread issue can come from one or several levels: kernel defaults, software package defaults, configuration templates, ISP intervention, router software features, alternative protocols, reduced reliance of deprecating services. – Nevin Williams – 2014-12-25T11:42:56.493