How can I troubleshoot Linux router/firewall forwarding performance with Intel 10 Gbe?

Question

We have a Linux firewall with two outward facing 10Gbe adapters (Intel 82599EB) and one inward facing 10Gbe adapter (Intel 82598EB).

The problem I'm experiencing is that the firewall will only forward inbound traffic at a very low rate: approximately < 2 Mbps. However, a direct connection from the firewall to an "inside" machine gets ~6 Gbps, while a direct connection to the firewall from an outside machine gets ~1 Gbps. There's some tuning to be done clearly, but they demonstrate Gbps speeds.

We recently updated the Intel ixgbe driver from version 2.1.4 to 3.7.14 due to stability concerns with the 2.1.4 driver (lock-ups) and this seems to be when the throughput problems began.

I also tried the 3.7.17 release, but this gave similar performance to 3.7.14. On reverting to the 2.1.4 driver (re-compiled for an updated kernel, with IXGBE_NO_LRO and IXGBE_NO_NAPI) I was able to get ~Gbps throughput (well ~900 Mbps with iperf over TCP with 3 threads).

This solves the immediate problem, but I would prefer to be able to use the current version of the driver as I'd like to keep up with bug-fixes etc. so, my question is

How can I troubleshoot Linux router/firewall forwarding performance?

Specifically, how can I find out where the kernel / iptables / network driver, etc. are spending their time when forwarding packets?

Any relevant advice would be appreciated.

How are you testing it and how are your outbound adaptors configured? Two ISP's? primary + backup? — hookenz, Sep 20 '15 at 06:23

pfo · Answer 1 · 2012-01-04T18:04:57.387

Really strange that you only get 1 Gbps of routing performance (even tough filtering usually means 2 copies from in kernel space for the same device, probably 4x for routing) - there was a LKML post a year ago that you can get 120Gbps of routing performance on 2.6.3X series with ixgbe devices. I mostly use Intel 10GbE NICs and usually get 1000MByte/s+ with iperf over a switched infrastructure.

First you need to check how the system performs for plain TCP with something like iperf between your endpoints. This should get you a baseline. Remember that a lot of things come into play if you need 10Gbps wire speed. On pre-Nehalem platforms this is even impossible to achieve. Also the system load should match the NUMA layout and the NICs have to be attached to the same PCI-complex (this is important if you're stuck at < 8 Gbps). The ixgbe source distribution has a IRQ pinning script (which also disables things like power saving and the irqbalancer which will only mess up the caches and is not topology aware) that should layout the RX-TX queues evenly across all cores (haven't checked them in a while).

Regarding your question about timings you need a kernel compiled with profiling support and a system level profiler like oprofile.

Get your endpoint to endpoint performance ironed out before you enable packet filtering or routing and post that.

score 1 · Answer 2 · answered Jan 04 '12 at 17:32

Several months ago I put a bunch of effort into optimizing Linux for wirespeed Gigabit routing with lots of small packets. This was for a load balancer (IPVS) and not a NAT firewall. Here are some tips based on that.

Upgrade Linux kernel to at least 2.6.30 (we needed updated Broadcom bnx2 driver)
Use ifconfig to look at interface for any kind of errors/drops/etc
Download and compile latest ethtool to make sure it fully supports your NIC driver
Use ethtool to look for more detailed statistics
Use ethool to tune coalescing, NAPI, etc settings to minimize interrupts
Look at irqbalance to make sure those are balanced across CPU cores
Look at kernel threads like ksoftirqd... are they using a lot of CPU?
COMPLETELY disable iptables by unloading the kernel modules with rmmod. Especially NAT and conntrack can have huge negative impact, even if you've flushed all the rules and have empty chains. I saw a huge performance increase when doing this. You mentioned this is a firewall, but I would still temporarily unload the NAT and conntrack modules to see if it makes any difference.

I have not yet seen any breakdown on time spent per kernel networking function such as switching vs routing vs firewall vs whatever.

score 0 · Answer 3 · answered Jan 04 '12 at 14:22

Iptables is really an efficient firewall for Linux systems. It can handle a huge amount of traffic without begin the bottleneck given that you have written a good ruleset.

One thing you can do is to disable iptables by flushing all rules and set default FORWARD policy to ACCEPT. This way you can eliminate any concern about your iptables implementation. After that, you can look at the network driver and try to debug the problem if it persists.

As an advice, be careful and not disable iptables on a publicly accessible machine unless you know what you are doing.

Dmitriusan · Answer 4 · 2015-09-20T06:16:41.350

One-way pour performance may be caused by issues with tcp segmentation offload and other settings on NIC. It may be spotted in many cases, e.g. with VM or VPN traffic going through a physical NIC. It's easy to disable it using ethtool and check performance, so it's worth trying (make sure you disable it on both endpoints for test).

/usr/sbin/ethtool -K eth0 tso off
/usr/sbin/ethtool -K eth0 lro off

Here is a little more background:

http://www.peerwisdom.org/2013/04/03/large-send-offload-and-network-performance/ https://social.technet.microsoft.com/Forums/windowsserver/en-US/bdc40358-45c8-4c4b-883b-a695f382e01a/very-slow-network-performance-with-intel-nic-when-tcp-large-send-offload-is-enabled?forum=winserverhyperv

How can I troubleshoot Linux router/firewall forwarding performance with Intel 10 Gbe?

4 Answers4