About MTU settings in machines and switch

3

2

Suppose I have two machines and one switch.

M1--Switch--M2.

The settings are:

  • M1 has MTU set to 100
  • Switch has MTU set to 1000
  • M2 has MTU set to 1000.

Questions:

  1. When M1 tries to send a 100-byte packet to M2, there should be no problem, right?

  2. When M2 tries to send a 1000-byte packet to M1, is there any problem?

  3. M2 can send a 1000-byte packet to Switch, but when Switch tries to send the packet to M1, it needs to fragment the packet into 10 small packets. Is that right?

Update:

To be more realistic: M1, Switch and M2 are all running on a 10G network, and we use IPv4.

The settings are:

  • M1 has MTU set to 1500

  • Switch has MTU set to 9000

  • M2 has MTU set to 9000

Does it help to anwser the question?

performanceuser

Posted 2011-04-13T19:06:50.543

Reputation: 221

To be more specific and more realistic: – performanceuser – 2011-04-14T17:51:03.290

Answers

7

You didn't specify what networking technologies you were talking about, so I'm going to assume Ethernet and IP[v4].

Ethernet has always defined its range of acceptable payload lengths to be from 46 to 1500 bytes, and requires all devices (hosts and switches) on the LAN to be able to receive frames with 1500-byte payloads. Because of this, Ethernet does not provide a fragmentation mechanism, nor does it provide a mechanism for communicating or negotiating MTUs (or, more importantly, MRUs -- Maximum Receive Units) between devices. In fact the term "MTU" or "maximum transmission unit" does not appear anywhere in the IEEE 802.3 specification.

So let's add IP into the picture. IP has a concept of an MTU, and most modern IP stacks let you set MTUs on a per-interface basis (and more). But your question as stated doesn't quite work out in the context of IP either, because IP has a minimum MTU of 576. So allow me to restate your question as "M1 has an MTU of 600, and M2 has an MTU of 1200". But what MTU shall we say that "Switch" has? Well, if Switch is just a Layer 2 Ethernet switch, it doesn't have a concept of a settable MTU. So to make your question work out in the context of IP, we'll have to turn that switch into a router. So let's call it "Router" and say it has two Ethernet interfaces, one attached to M1 and one attached to M2. Let's also say it has MTUs of 1200 set on both of its interfaces.

  1. When M1 sends a frame with a 600-byte payload to M2, there would be no problem.
  2. When M2 sends a frame with a 1200-byte payload to M1, there still would be no problem. Why not? Because setting M1's MTU didn't necessarily change its MRU, and in my experience MTUs and MRUs are separate, and implementations don't give you a way to change your MRU. So M1's MRU on that interface would be 1500 since it's Ethernet.
  3. Router wouldn't know it needs to fragment the frames from M2, because it believes all hosts on the Ethernet LAN that M1 is on are able to receive frames with 1200-byte payloads, because it was configured for a 1200-byte MTU on that interface. Luckily this would still probably work out fine, as I discussed in (2).

Okay, still trying to find and answer the true spirit of your question, let's say the link between M1 and Router is actually PPP instead of Ethernet. The PPP protocol allows hosts to communicate/negotiate their MRUs. Let's say that M1 told Router that M1 has a 600-byte MRU limitation, so Router has set its MTU for that link to 600 bytes.

Now, in this case, if M2 sends a 1200-byte IP datagram to M1 (without setting the "Don't Fragment" bit in the IP header), Router will receive it just fine, and realize it needs to fragment it to send it to M1. So does Router fragment it into two 600-byte fragments? Well, no, it's not that simple for a couple reasons.

One reason is that every fragment has to have its own IP header, which adds 20 bytes to the size of each fragment after the first. The other reason is that IP's fragmentation offset field counts in 8-byte chunks instead of individual bytes.

So let's say the 1200-byte datagram was specifically 1172 bytes of application data in a UDP datagram (8 bytes of UDP headers, 20 bytes of IP headers). After fragmentation, the first fragment would contain a 20-byte IP header, the 8-byte UDP header, and the first 568 bytes of the application data, for a total of 586 bytes. The second frame would contain another 20-byte IP header, no UDP header, and the next 576 bytes of the application data, for a total of 586 bytes. That leaves 28 bytes of application data left over for the final fragment, which, with its IP header added, would be 48 bytes.

Update based on Kavin's update that he was talking about Jumbo frames:
Jumbo frames are something that some Gigabit Ethernet product vendors created independently around the time GigE was created, and it was (I believe) subsequently rejected or ignored by the IEEE and seems unlikely to ever become part of the 802.3 Ethernet standard. Even IEEE 802.3-2008 which includes not just 1000BASE-T but 10GBASE-T, does not contain anything about 9000-byte frame payloads.

The vendors that came up with jumbo frames did not provide any kind of autonegotiation or communication mechanism for jumbo frame support, nor did they create an Ethernet-layer fragmentation method to handle the (very common) case you illustrated. If you want to run your Ethernet LAN in this nonstandard mode, you have to ensure that all hosts and switches on your LAN support jumbo frames.

If M1's NIC is not capable of receiving jumbo frames, it will consider a jumbo frame to be "Ethernet jabber" -- a broken device that "keeps jabbering on and on"; keeps sending bits well beyond the end of a maximum allowable 1500 (really 1518) -byte frame. Note that this meaning of jabber is a term for a kind of Ethernet malfunction and is not to be confused with the similarly-named "Jabber" Internet chat system. You'll have to decide if you want to stop using jumbo frames on this network, or if you want to upgrade M1 to have a NIC that supports jumbo frames.

If M1's NIC is capable of receiving jumbo frames, I suspect that setting its IPv4 MTU for that interface down to 1500 will ensure it doesn't transmit any jumbo sized IP datagrams in a single jumbo Ethernet frame, but it will most likely be able to receive large IP datagrams in single jumbo Ethernet frames no problem, because again, MTU is not MRU, and setting an IP-layer MTU doesn't affect what size frame buffers the NIC allows. Now, if you're tweaking a NIC/driver setting to tell the NIC to only use 1500-byte buffers instead of 9000-byte buffers, that's an Ethernet-layer change, and would probably make your NIC act as if it didn't support 9000-byte buffers.

Spiff

Posted 2011-04-13T19:06:50.543

Reputation: 84 656

Thank you for your answer. You said that the switch doesn't have any MTU setting? Can you take a look at my original post. I have changed a bit, and the question is more realistic. – performanceuser – 2011-04-14T18:00:26.370

In my case, all the machines are in a big 10G network, the communication between doesn't need a router. They are all connected with a switch. And M1 set to 1500, M2 set to jumbo frame 9000 – performanceuser – 2011-04-14T18:07:53.307

Okay @Kavin, I've updated my Answer with information about jumbo frames. – Spiff – 2011-04-15T00:52:10.973

1Thank you so much. I really appreciate your help and very informative answer. I am encountering a problem that in a sub-net all the machines enabled jumbo frames, while this change affected one of the performance test result. The CPU util drop from 70% to 20%. Can you provide some suggestions? – performanceuser – 2011-04-15T01:28:26.340

Please refer to this question:http://superuser.com/questions/271080/why-jumbo-frames-affects-the-performance-of-the-server

– performanceuser – 2011-04-15T01:43:19.930

1

I honestly don't think you can set an MTU to 100 and still establish any form of an IP connection. I believe ipv4 REQUIRES a minimum of 576... and possibly more. That's CRAAAZY SMALL... typically 10/100 switches built in the last 20 years have a 1492 or 1500 MTU... and in more demanding networks with better equipment all the way to 9000.

TheCompWiz

Posted 2011-04-13T19:06:50.543

Reputation: 9 161

1What about when it is OVER 9000! :P – Supercereal – 2011-04-13T19:16:28.450

I must confess... I LOL'd. – TheCompWiz – 2011-04-13T19:17:48.907

I just tried to abstract my problem. – performanceuser – 2011-04-13T23:04:26.317

I just need to know the mechanism. The first word is "suppose". I am not saying this is a real work case – performanceuser – 2011-04-13T23:05:27.423

Then use realistic values for your example. You asked if a packet trying to leave an interface with a MTU set at 100 would be able to reach the other machine. The answer is no. Simply because the window is too small to contain the needed information to get to the other side. – TheCompWiz – 2011-04-14T13:14:04.170

0

There is a technology called 'pmtu' or Path MTU, whereby one end discovers the maximum size of packet it can reliably send to the other and sizes its packets down to the size of the smallest MTU.

Bigger packets than this are fragmented, unless the "DF" or Do not Fragment flag is set in the IP header, in which case the packet will be lost en-route.

On a peer-to-peer connection like you are describing it should use PMTU quite happily. It only becomes a problem when you are routing through a number of networks and one of the routers between you and the destination doesn't support PMTU properly and doesn't report the correct MTU size to use.

Majenko

Posted 2011-04-13T19:06:50.543

Reputation: 29 007