16

I am interested in particular answers:

  1. Does the NIC with GRO edits/creates TCP ACK or any other packets (or is this feature transparent to receiver/sender TCP stacks)?
  2. There should be a timeout/event when NIC should pass the "glued segments" to the TCP stack? What are they?
  3. In packet forwarding setup - does the GRO feature also tries to read receiver ACKs (see below why I am asking this)?
  4. Any source that explains GRO and also other NIC offloading features (TSO, LSO ...) better than wikipedia and linux man pages would be really appreciated.

More details:

I am troubleshooting a performance problem with one IPSec implementation. The problem is that available bandwidth is not evenly distributed across all 4 VPN tunnels (distributed approximately as 200MBps/200MBps/1MBps/1MBps; Each VPN tunnel encapsulates single TCP connection). In PCAP once in a while I see that webserver idles for like ~2 seconds (waiting for ACK). Downloading resumes when webserver retransmits unacknowledged segments.

My inner felling from PCAP is that NIC GRO feature glues packets together but sometimes do not pass them to TCP stack in a timely manner and that is causing the problems.

As this VPN server does not have interfaces that terminate TCP connections but rather only forwards packets. Then I tried to disable GRO and after that I observed that traffic was evenly distributed across all tunnels. Also when TCP window scaling is disabled on Webserver, then bandwidth is also even distributed even with GRO enabled (that is why I had question #3).

I am using 2.6.32-27 linux on Ubuntu 10.04 server (64-bit). NIC is Intel 82571EB. All interfaces (HTTP client, VPN client, VPN Server, Webserver) are connected directly in chain with 1Gbit Ethernet cables.

user389238
  • 612
  • 4
  • 8
  • 17

1 Answers1

18

I've found this article amazingly useful: JLS2009: Generic receive offload. It gives a great overview of how GRO works.

  1. Some adapters might do it, but the associated drivers have to be aware of it as well. Also, drivers themselves can do this in software. As this happens before entering the Kernel TCP/IP stack, by the time the kernel-space TCP/IP stack is fully entered the packets have been resequenced.
  2. The timeout is defined by the GRO spec as one TCP/IP 'tick' (increment of the Time Stamp field), which is a very small number but on fast networks multiple packets may still be received.
  3. GRO will come into play on the receiving side of the forwarder, and in fact GRO was created so the more greedy LRO method would stop screwing up packets on forwarders.
  4. That article I linked to above really helps.

Ethtool may be able to enable/disable GRO on specific interfaces. Depends on the version.

Jonathon Reinhart
  • 446
  • 1
  • 8
  • 25
sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • 1
    I updated my question. It seems that you answered #1 in context of all offloading features (IMHO GRO alone does not generate ACKs - it only "glues" all the packets for one TCP/IP tick and then handles them to OS). Thank you! – user389238 Feb 07 '11 at 18:55