11

I am trying to implement a 9000 byte MTU for storage communication between KVM guests and the host system. The host has a bridge (br1) with a 9000 byte MTU:

host# ip link show br1
8: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP 
    link/ether fe:54:00:50:f3:55 brd ff:ff:ff:ff:ff:ff
    inet 172.16.64.1/24 brd 172.16.64.255 scope global br1
    inet6 fe80::21b:21ff:fe0e:ee39/64 scope link 
       valid_lft forever preferred_lft forever

The guests have an interface attached to this bridge that also has a 9000 byte MTU:

guest# ip addr show eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:50:f3:55 brd ff:ff:ff:ff:ff:ff
    inet 172.16.64.10/24 brd 172.16.64.255 scope global eth2
    inet6 fe80::5054:ff:fe50:f355/64 scope link 
       valid_lft forever preferred_lft forever

I can ping from the host to the guest:

host# ping -c4 172.16.64.10
PING 172.16.64.10 (172.16.64.10) 56(84) bytes of data.
64 bytes from 172.16.64.10: icmp_seq=1 ttl=64 time=1.15 ms
64 bytes from 172.16.64.10: icmp_seq=2 ttl=64 time=0.558 ms
64 bytes from 172.16.64.10: icmp_seq=3 ttl=64 time=0.566 ms
64 bytes from 172.16.64.10: icmp_seq=4 ttl=64 time=0.631 ms

--- 172.16.64.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.558/0.727/1.153/0.247 ms

But if I increase the ping packet size beyond 1490 bytes, I no longer have connectivity:

host# ping -c4 -s 1491 172.16.64.10
PING 172.16.64.10 (172.16.64.10) 1491(1519) bytes of data.

--- 172.16.64.10 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3000ms

A packet trace shows that these packets never reach the guest. Everything I've read indicates that both the Linux bridge interface and the virtio network drives all support jumbo frames, but this sure looks like an MTU problem to me.

Am I missing something really obvious?

Update

Showing the host-side of the guest interface:

host# brctl show
bridge name bridge id       STP enabled interfaces
br1     8000.fe540050f355   no      vnet2

host# ip addr show vnet2
11: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast master br1 state UNKNOWN qlen 500
    link/ether fe:54:00:50:f3:55 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe50:f355/64 scope link 
       valid_lft forever preferred_lft forever
larsks
  • 41,276
  • 13
  • 117
  • 170

2 Answers2

8

While this was an MTU problem, it turns out that it had nothing to do with the MTU settings on any of the component devices. As I showed in the original question, the host bridge, host tun interface, and guest interface all had the same MTU setting (9000 bytes).

The actual problem was a libvirt/kvm configuration issue. By default, libvirt does not use virtio devices. Absent an explicit configuration you end up with a RealTek RTL-8139 NIC. This virtual NIC does not support jumbo frames.

To use virtio devices, you need to specify an explicit model. When using virt-install:

virt-install ... -w bridge=br1,model=virtio

Or after the fact by adding a <model> tag to the appropriate <interface> element in the domain XML:

<interface type="bridge">
  <model type="virtio"/>
  <source bridge="br1"/>
  <target dev="vnet2"/>
</interface>

With this change in place, everything works as intended.

larsks
  • 41,276
  • 13
  • 117
  • 170
0

for larger MTU to work, the entire stack has to have the higher MTU, that includes the guests, the tapdevs, and the physical NICs the bridge is attached to (if you have bonds and vlans on the way - them too)

dyasny
  • 18,482
  • 6
  • 48
  • 63
  • Do you know if specific examples, like GigaEthernet and beyond, where this would be the result of auto-negotiation? This post maybe a duplicate : https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&ved=0CEIQFjAB&url=http%3A%2F%2Fserverfault.com%2Fquestions%2F26024%2Freally-poor-performance-with-10gbit-ethernet-on-custom-app&ei=sX7aUImhA8jOsgbhqIDIAw&usg=AFQjCNHeHyrHHwj6kp9mrc-2om25lwBiVQ&sig2=_xdGqd4UjOlyzzqwEVQTyg&bvm=bv.1355534169,d.Yms – ArrowInTree Dec 26 '12 at 04:35
  • no, has to be done manually, all the stack set to the highest MTU of any given component – dyasny Dec 26 '12 at 05:41
  • Yes, I realize that; that is well documented all over the place. As you can see from the question, the guests, the tapdevs, and the bridge all have the higher MTU. Do you see anything misconfigured in the examples I've given? – larsks Dec 26 '12 at 12:14
  • to use non-default MTU settings, everything must adhere to the non-default MTU. That, from top to bottom, should be the guest NIC, the tap, the bridge, eth (+vlan + bond) under the bridge, and of course the switch port. I have tested it just a few minutes ago and it works perfectly on RHEL with kvm – dyasny Dec 26 '12 at 14:33
  • Right, and I think I've clearly shown in the question the value at all parts of the stack. Do you see either any missing information or something that isn't configured correctly? – larsks Dec 27 '12 at 01:48
  • wow, you were not using virtio, and you pretty much said you did. Glad you worked it out – dyasny Dec 27 '12 at 04:52