20

I am using a fast ethernet of 100 Mbps, whose frame size is less than 1500 bytes (1472 bytes for payload as per my textbook). In that, I was able to send and receive a UDP packet of message size 65507 bytes, which means the packet size was 65507 + 20 (IP Header) + 8 (UDP Header) = 65535.

If the frame's payload size itself is maximum of 1472 bytes (as per my textbook), how can the packet size of IP be greater than that which here is 65535?

I used sender code as

char buffer[100000];
for (int i = 1; i < 100000; i++)
{
    int len = send (socket_id, buffer, i);
    printf("%d\n", len);
}

Receiver code as

while (len = recv (socket_id, buffer, 100000))
{
     printf("%d\n". len);
}

I observed that send returns -1 on i > 65507 and recv prints or receives a packet of maximum of length 65507.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296

6 Answers6

19

UDP datagrams have little to do with the MTU size you can make them as big as you like up to the 64K is maximum mentioned above. You can even send one of them in an entire packet as long as you are using jumbo frames with a size larger the large datagram.

However jumbo frames have to be supported by all the equipment the frame will pass over and this a problem. For practical purposes Ethernet frames are the most common tranport size, the MTU for these is circa 1500 bytes, I will say 1500 going forward, but it isn't always. When you create a UDP datagram larger than the underlying MTU (which as indicated is most often be ethernet) then it will be quietly be broken up into a number of 1500 byte frames. If you tcpdump this traffic you will see a number of packets broken at MTU boundary which will have the more fragments flag set along with a fragment number. The first packet will have a fragment number of 0 and the more fragments set and the last one will have a non-zero fragment number and more fragments not set.

So why care? The implementation detail actually matters. Fragmentation can hurt performance in the network not a big issue anymore but one to be aware of. If a huge datagram size it used then should any fragment be lost the whole datagrams will need to be resent. Equally at high volumes and today these are perfectly achievable volumes then mis-association of frames at reassembly is possible. There can also be problems getting fragmented UDP packets to traverse enterprise firewall configurations where load balancers spread the packets out, if one fragment is on one firewall and the other on a different one then the traffic will get dropped as incomplete.

So don't create UDP datagrams bigger than the MTU size fragmentation unless you have to and if you have to specify that the infrastructure being communicated between is close (same subnet close) at which point jumbo frames would likely be a good option.

Martyn A
  • 191
  • 1
  • 2
  • Good information about the 'more fragments flag'. Is that a flag in the UDP header or in the IP header? – John Jesus Nov 03 '15 at 21:44
  • 1
    Note: Some OSes will NOT transmit UDP if data will be fragmented. I.E. Linux doc, `By default, Linux UDP does path MTU (Maximum Transmission Unit) discovery. This means the kernel will keep track of the MTU to a specific target IP address and return EMSGSIZE when a UDP packet write exceeds it.` – Rahly Sep 26 '16 at 04:45
10

UDP doesn't know anything about MTU. UDP packets can have any size from 8 to 65535 bytes. The protocol layers below UDP either can send a packet of a specific size or will reject to send that packet with an error if too big.

The layer below UDP is usually IP, either IPv4 or IPv6. And IP packet can have any size from 20(IPv4)/40(IPv6) to 65535 bytes, that's the same maximum as UDP. However, IP supports a mechanism called fragmentation. If an IP packet is larger in size than what the layer below can transport, IP can split a single packet into multiple packets called fragments. Every fragment is in fact an IP packet of its own (has an own IP header) and is also sent on its own to the destination; it is then the task of the destination to collect all fragments and re-build the full packet out of them before passing the received data on the next higher layer (e.g. UDP).

The Ethernet protocol can only transport frames with a payload between 46 and 1500 bytes (there are exceptions but that is beyond the scope of this reply). If the payload data is less than 46 bytes, it is padded to be exatly 46 bytes. If the payload data is beyond 1500 bytes, the interface will refuse to accept it. If that happens, it's up to the IP layer to now decide to either fragment the packet, so that no fragment is larger than 1500 bytes or report an error to the next higher layer if fragmentation has been disabled or forbidden for this particular connection.

Fragmentation is generally to be avoided, as

  • is wastes resources at the sender side.
  • it wastes resources at the receiver side.
  • it increases the protocol overhead for the same amount of payload data.
  • if a single fragment is lost, the entire packet is lost.
  • if a single fragment is corrupted, the entire packet is corrupted.
  • in case of a resend, all fragments must be resent.

That's why TCP intelligently adopts its frame size so that the packets never require IP to fragment them. This can be done by forbidding IP to fragment packets and if IP reports that a packet is too big to be sent, TCP reduces the frame size and tries again, until no error is reported anymore.

For UDP, though, this would be the task of the application itself, as UDP is a "dumb" protocol, it has no management logic of its own, which makes it very flexible, fast, and simple.

The only UDP size you can rely on to be always transportable is 576 minus 8 bytes UDP header and minus 20(v4)/40(v6) bytes IP header, as the IP standard requires every IP host to be able to receive IP packets with a total size of 576 bytes. Your protocol implementation would not be standard conform if it cannot accept packets of at least that size. Note, however, that the standard doesn't say 576 without fragmentation, so even a 576 byte IP packet may get fragmented between two hosts.

The only packet size you can rely to be transportable without fragmentation is 24 bytes for IPv4 and 56 bytes IPv6, as the smallest IP headers for a fragment are 20/48 bytes (v4/v6) and a fragment must have at least 4/8 bytes (v4/v6) payload data. Thus a transport system below IP layer that cannot transport at least packets of theses sizes, cannot be used to transport IP traffic.

And before anyone comments that an IPv6 header only has 40 bytes: That is correct but, unlike an IPv4 header, a standard IPv6 header has no header fields for fragmentation. If a packet has to be fragmented, then a fragmentation extension header must be added below the IPv6 base header and this extension header is 8 bytes long. Also unlike IPv4, fragmentation offsets in IPv6 are counted in 8 bytes and not 4 bytes units, thus a fragment can only carry a payload that is a multiple of 8 bytes in case of IPv6.

Mecki
  • 799
  • 1
  • 6
  • 16
2

The IP layer will fragment your packet on the sending end, and then reassemble it back on the receiving end, before passing it up to UDP. From the UDP layer, you can't really tell that the packet has been fragmented. If you use a packet capture tool like Wireshark, you should be able to see that your computer is receiving IP packets limited to the MTU.

Jeff
  • 360
  • 1
  • 2
  • 11
1

Turns out that allowing the TCP/IP stack to fragment packets as needed is a lot lower overhead than sending individual packets.

geekosaur
  • 7,025
  • 1
  • 19
  • 19
  • 1
    Do you mean that TCP/IP is fragmenting and reassembling itself ? If yes, then why do people say all the time that your code should take care of reassembly at the receiver end. I did not observed fragmentation as of now, but had seen many forums which are saying this and even people accepting it. –  Mar 12 '11 at 05:36
  • For those of us that are OSI model challenged, could you add a bit more detail to your answer please? – Robert Harvey Mar 12 '11 at 05:41
  • I was being a bit cagey because I can't tell if this is homework or not. It's a tradeoff: since UDP makes no delivery guarantees, if any packet fragment is dropped the whole packet is lost. If you want a reliable transport atop UDP, you need to handle all of this yourself; but if you're doing (say) streaming protocols (or NFS over UDP, which took the streaming-like path) it's lower overhead to either simply drop those packets or retransmit the larger packet after a longish delay if needed. You need to balance your needs against protocol features and protocol overhead. – geekosaur Mar 12 '11 at 05:49
0

To answer your question, "If the frame's payload size itself is maximum of 1472 bytes (as per my textbook), how can the packet size of IP be greater than that which here is 65535?"

It is due to an offloading feature called UFO.(UDP Fragmentation Offload). Please refer to this link.

You can verify and toggle offloading features via ethtool -k ethX and ethtool -K ethX respectively.

Nehal Dattani
  • 581
  • 2
  • 10
0

If you're monitoring outgoing frames, it's possible that your network adaptor supports segmentation offloading, and it is enabled. With segmentation offloading enabled, the network card itself handles segmenting the packet/frame into the appropriate size, rather than the network stack. This frees the CPU in the computer to perform other tasks, improving performance. On linux, "ethtool -k [device]" will show the offload flags.

Andrew Bowers
  • 33
  • 1
  • 6