1

We have 2 VPN servers at the same hoster. The servers are virtual and they have different Linux distros. The VPN clients establish https connections to the same Amazon EC2 server. The TCP packets from EC2 always have the "Don't fragment" flag set.

Although the MTU on both the physical and the tun interfaces of both VPN servers are 1500 they usually receive larger packets from the EC2. I'm not sure how it's possible, but maybe it has something to do with Virtio.

Anyway, when the TCP traffic is forwarded to the tun interfaces the servers behave differently:

  • On "server 1" the large packets are dropped as expected and the ICMP "Fragmentation needed" is sent back to EC2.
  • On "server 2" the TCP traffic is refragmented, but it's not the IP fragmentation, but rather a completely new TCP stream as if there was an app with two sockets on the VPN server. The DF flag is retained.

So I assume there's some sysctl setting which enables this behavior on "server 2". Am I right? Where is this setting?

wireshark screenshot

I configured the forwarding on server 2 purely with firewall-cmd Here's the firewall config:

external (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources: 10.8.0.0/24 10.8.1.0/24
  services: dhcpv6-client http https irc ircs openvpn smtp ssh
  ports: 1398/tcp 1194/tcp 1401/tcp 1402/tcp 65213/tcp 500/udp 501/udp
  protocols:
  forward: yes
  masquerade: yes
  forward-ports:
        port=1500:proto=tcp:toport=1500:toaddr=10.8.1.32
        port=1501:proto=tcp:toport=1501:toaddr=10.8.1.32
  source-ports:
  icmp-blocks:
  rich rules:
        rule family="ipv4" source address="10.8.1.0/24" port port="3128" protocol="tcp" accept

ethtool output

localhost:~ # ethtool -k eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: on
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]
localhost:~ #```
basin
  • 548
  • 1
  • 3
  • 20

1 Answers1

0

Pretty certain that this is going to be related to TCP segmentation offload which allows the kernel to stuff packets much larger than the MTU onto the ring buffers of the ethernet driver, then the driver itself is getting the device to write out the IP headers and divide up the packets for you before they exit the device.

Hence, whats actually leaving the device will honour any MTU but what you are sending from the host to the device wont make any sense in a packet sniffer as (like you mentioned) the packet sizes wont appear to honour the MTU you expected to see.

If you check the receiver of these packets in a packet sniffer all the MTUs should line up though and have a DF flag set.

You can turn off TSO in ethtool - generally its not a good idea to do this, however there can be times where turning it off is better -- I've had problems in the past before with TLS connections not properly calculating hashes and being rejected on the receivers side because of it.

Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
  • But I am the receiver! tcpdump shows the huge received packets, not transmitted. – basin Jul 07 '22 at 13:48
  • @basin good point! Although gro is on, I've not experienced it myself but I imagine it might have the same effect. What happens if you turn off GRO? Do the captures make more sense? – Matthew Ife Jul 07 '22 at 14:58
  • The following `ethtool -K eth0 rx-gro-hw off gro off` (both were needed) indeed gets rid of huge received packets – basin Jul 07 '22 at 16:16
  • I still don't understand who splits the huge packets when forwarding. I tried to disable gso on `tun1`, but it didn't change anything. Basically I want to reproduce the situation of "server 1" on "server 2" – basin Jul 07 '22 at 16:31
  • There is tso as well, for TCP which works on layer 4 to split the MSS. So you might want to adjust all the values to off. – Matthew Ife Jul 08 '22 at 07:29