Fast IP Forwarding to WAN, but falls off cliff between LAN subnets

0

I replaced my consumer wireless router with a linux box that has a quad-gigabit NIC PCIe card and a single gigabit NIC on the motherboard (for the WAN). After turning on IP forwarding, masquerading (via iptables), and setting up subnets on each of the four LAN interfaces I ran some speed tests.

$ ip route
default dev ppp0 scope link 
10.0.0.0/16 dev enp3s0f0 proto kernel scope link src 10.0.0.1 
10.64.0.0/16 dev enp3s0f1 proto kernel scope link src 10.64.0.1 
10.192.0.0/16 dev enp4s0f1 proto kernel scope link src 10.192.0.1 
aaa.bbb.ccc.ddd dev ppp0 proto kernel scope link src www.xxx.yyy.zzz 
  • From a wireless device on one of the LAN subnets to a speedtest server on the WAN I get the full 40 Mbps / 5 Mbps I pay my ISP for.

  • From the router host to a wired LAN host using iperf3 I can consistently maintain 930+ Mbps for several minutes.

  • From a wired device on one of the LAN subnets to a wired device on a different LAN subnet using iperf3 I initially get 80-95 Mbps for the first few seconds but it rapidly drops to zero.

  • From a wired device on one of the LAN subnets to a wired device on a different LAN subnet using iperf3 with a target bitrate of 20 Mbps I see the similar results (see update at end), but can sustain about 10 Mpbs

.

Connecting to host 10.0.0.2, port 5201
[  5] local 10.192.128.3 port 35620 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  10.2 MBytes  85.9 Mbits/sec    0   73.5 KBytes       
[  5]   1.00-2.00   sec  9.01 MBytes  75.6 Mbits/sec    0   82.0 KBytes       
[  5]   2.00-3.00   sec  8.26 MBytes  69.3 Mbits/sec    0   79.2 KBytes       
[  5]   3.00-4.00   sec  9.01 MBytes  75.6 Mbits/sec    0   73.5 KBytes       
[  5]   4.00-5.00   sec  5.28 MBytes  44.3 Mbits/sec    1   1.41 KBytes       
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
^C[  5]  10.00-13.63  sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-13.63  sec  41.8 MBytes  25.7 Mbits/sec    5             sender
[  5]   0.00-13.63  sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

This is suggesting to me that there's some problem forwarding packets between the subnets. I first ensured that my iptables rules are as minimal as possible:

-t nat -A POSTROUTING -o ppp0 -j MASQUERADE
# WAN connection is PPPoE and VLAN tagged
-t filter -A FORWARD -o ppp0 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS  --clamp-mss-to-pmtu

Dumping the iptables state I see low packet counts for both rules.

Next I checked for packet loss. There does seem to be a small but consistent amount of packet loss / retransmits.

$ sudo netstat -s | egrep -i 'retransmit|drop'
    498 outgoing packets dropped
    25848 fast retransmits

I then thought that maybe there was a buffer or queue that was filling and packets were getting dropped. I calculated the average bandwidth-delay product and compared that against the reserved memory.

$ sudo ping -f 10.0.0.2 -s $((1500-28))               
PING 10.0.0.2 (10.0.0.2) 1472(1500) bytes of data.
.^C
--- 10.0.0.2 ping statistics ---
9036 packets transmitted, 9035 received, 0% packet loss, time 26512ms
rtt min/avg/max/mdev = 1.742/2.817/12.057/0.758 ms, pipe 2, ipg/ewma 2.934/3.091 ms

$ echo "1*(1024^3) * 0.003" | bc 
3221225.472

$ cat /proc/sys/net/ipv4/tcp_mem
18396   24529   36792

$ getconf PAGESIZE
4096

That appears to be sufficient. So now I'm a bit stuck. I ran tcpdump on the iperf3 client and can see things moving along well for a bit. Then I see a long (almost 250ms) period of silence before lots of retransmits and duplicate acknowledgements.

Since I can pull sufficient download speeds from the WAN I don't suspect that the onboard NIC is at fault. I'm looking for help to diagnose this quad-NIC (details below) and possibly a dumb layer-2 gigabit switch (Netgear GS-108) and any other kernel configuration that could be getting in the way. I doubt it's the switch, as it's never been a problem before and I can maintain speeds from the router's loopback to that subnet. Only inter-subnet performance appears to be affected.

  *-network:0               
       description: Ethernet interface
       product: 82571EB Gigabit Ethernet Controller (Copper)
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: enp3s0f0
       version: 06
       serial: 00:26:55:xx:xx:xx
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=5.12-2 ip=10.0.0.1 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
       resources: irq:24 memory:fe920000-fe93ffff memory:fe880000-fe8fffff ioport:d020(size=32)

UPDATE:

$ iperf3 -b 20m -c 10.0.0.2
Connecting to host 10.0.0.2, port 5201
[  5] local 10.192.128.3 port 36554 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.49 MBytes  20.9 Mbits/sec    0    158 KBytes       
[  5]   1.00-2.00   sec  2.38 MBytes  19.9 Mbits/sec    0    150 KBytes       
[  5]   2.00-3.00   sec  2.38 MBytes  19.9 Mbits/sec    1    133 KBytes       
[  5]   3.00-4.00   sec  2.38 MBytes  19.9 Mbits/sec    0   73.5 KBytes       
[  5]   4.00-5.00   sec  2.38 MBytes  19.9 Mbits/sec    0   70.7 KBytes       
[  5]   5.00-6.00   sec  1.12 MBytes  9.44 Mbits/sec    2   1.41 KBytes       
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    2   1.41 KBytes       
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
iperf3: error - control socket has closed unexpectedly

$ iperf3 -b 10m -c 10.0.0.2 
Connecting to host 10.0.0.2, port 5201
[  5] local 10.192.128.3 port 36564 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.24 MBytes  10.4 Mbits/sec    0    201 KBytes       
[  5]   1.00-2.00   sec  1.25 MBytes  10.5 Mbits/sec    0    118 KBytes       
[  5]   2.00-3.00   sec  1.12 MBytes  9.44 Mbits/sec    0    127 KBytes       
[  5]   3.00-4.00   sec  1.25 MBytes  10.5 Mbits/sec    0    107 KBytes       
[  5]   4.00-5.00   sec  1.12 MBytes  9.44 Mbits/sec    0    110 KBytes       
[  5]   5.00-6.00   sec  1.25 MBytes  10.5 Mbits/sec    0   90.0 KBytes       
[  5]   6.00-7.00   sec  1.12 MBytes  9.44 Mbits/sec    0   87.2 KBytes       
[  5]   7.00-8.00   sec  1.25 MBytes  10.5 Mbits/sec    0   81.6 KBytes       
[  5]   8.00-9.00   sec  1.12 MBytes  9.44 Mbits/sec    0   78.8 KBytes       
[  5]   9.00-10.00  sec  1.25 MBytes  10.5 Mbits/sec    0    112 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  12.0 MBytes  10.1 Mbits/sec    0             sender
[  5]   0.00-10.04  sec  12.0 MBytes  10.0 Mbits/sec                  receiver

iperf Done.

Huckle

Posted 2018-02-25T05:54:23.550

Reputation: 376

What do you mean by "setting up subnets on each of the four LAN interfaces"? Did you create four LANs or one? And if one, then with four interfaces in the same LAN, you should only be setting up one IP address and one subnet. – David Schwartz – 2018-02-25T06:45:12.250

@DavidSchwartz Four subnets, with the interfaces configured as 10.0.0.1, 10.64.0.1, 10.128.0.1, and 10.192.0.1. – Huckle – 2018-02-25T06:46:35.693

Please give an ifconfig -a or ip addr. Looking for mac addresses on the quad NIC. – Pedro – 2018-02-25T06:46:47.697

@Huckle Is there some reason you created four separate LANs? Did you need isolation for some reason? This will mean that stuff that works within a LAN (such as broadcast service discovery and name resolution) won't work for devices on different LANs. – David Schwartz – 2018-02-25T06:47:46.773

Also where is the wireless coming from? Could you do an iperf3 between wired hosts connected to different interfaces? – Pedro – 2018-02-25T06:48:47.203

1@Pedro - retested from two wired devices both directly attached on different subnets (no layer 2 switch this time on the one subnet). Essentially identical results – Huckle – 2018-02-25T06:51:11.293

@DavidSchwartz Yes, the idea will be to eventually firewall one of the subnets to just WAN access and then firewall one of the others to just allow access to certain ports -- but this point iptables is as bare as possible to reduce variables, and I'm sure I've reloaded it's configuration. – Huckle – 2018-02-25T06:52:46.607

Could you run the iperf3 with --bandwidth at a couple points like 20m, 40m, and 80m and see if those eventually slow down? This will show if you're overflowing a buffer on the router. – Pedro – 2018-02-25T06:54:41.523

@Pedro OUI of one of the quad NICs is in the lshw output. The card is an HP NC364T – Huckle – 2018-02-25T06:55:45.800

1@Pedro 20 Mbps and 10 Mbps tests posted. 20 is a no-go. Seems to sustain 10. Breaks down around 12. – Huckle – 2018-02-25T07:01:27.023

Can you please do a full bandwidth iperf from a host on enp3s0f0 to one on enp3s0f1, enp3s0f1/enp4s0f1, and enp3s0f0/enp4s0f1 to see if one of the ports is bad? – Pedro – 2018-02-25T07:03:59.733

On the NC364T, can you just verify that the mac addresses are different for each port? It looks like you're only using two of the ports enp3s0f0 and enp3s0f1, correct? – Pedro – 2018-02-25T07:11:20.753

Let us continue this discussion in chat.

– Pedro – 2018-02-25T07:12:17.923

Answers

0

Thanks to @Pedro for helping me dig in. Originally I thought this was a bad piece of hardware, but after replacing it with another I'm certain it's a driver problem. I'm still digging in to find out if this is a bug that's been reported already or not (and whether a fix exists). In the mean time, I did locate a serverfault question which linked to a bug report that suggested turning off several offloading features. This at least got me from 0 bps to ~270 Mbps stably. Far short of the ~940 Mbps it is capable of, but better than nothing while I continue researching.

ethtool -K eth0 gso off gro off tso off

Huckle

Posted 2018-02-25T05:54:23.550

Reputation: 376