2

I am analyzing traffic between a client and a Linux webserver running on a HP blade server, the client sometimes gets stuck waiting for more data when the webserver has closed the connection.

The webserver runs apache2 that for some reason choses to run HTTP/1.1 with connection-close rather than allow the client to send multiple requests on the same connection and close the connection, as standard HTTP/1.1 (Thats another story... But it leaves the server with several thousands of TIME_WAIT sockets instead of pushing that state to the client)...

Anyway, sometimes a HTTP requests gets broken, still dont know where it actually breaks. On the serverside everything looks fine except that the client starts sending a lot of RST packets, inbetween the acks.

I have tcpdump captures from the webserver and from the NAT which the client goes through, I would suspect the NAT if it wasnt for a very strange behaviour on the webserver.

When the webserver serves the HTTP GET request, the first outgoing packet is 2960 bytes in IP payload, 2974 on wire. That is very strange since on the client end in the NAT the client receives two 1514 byte packets with 1460 bytes TCP payload.

The next and forthcoming packets that leaves the interface on the webserver uses a payload of 1460 (1514 on wire) which is within the MTU.

I believe that some magic is done in the (Cisco) SLB which sits between the webserver and the network, so the first DF packet of 2960 gets squeezed through the SLB and magically gets split in the SLB by some advanced L3 interception.

Q1) Why would the apache webserver/tcp stack even try to push a 2960 bytes packet on an interface which has MTU set to 1500?

Q2) How do it get trough the net arriving to the client as two packets?

Q3) How do the webserver know that the MTU should be decreased to 1460 even though no ICMP arrives with "Fragmentation needed" set, since all packets has the DF bit set.

Don't ask me why I ask these questions, I'm just a guy in a large large organistion trying to understand why things somtimes doesnt work.

I have some interesting tcpdump logs which I can post if needed, I just have to replace public IP addesses and such...

ernelli
  • 307
  • 1
  • 5
  • 15
  • 1
    Are you using 1Gbps Ethernet? Do any of the devices have Jumbo Frame support enabled? http://en.wikipedia.org/wiki/Jumbo_frame – mfarver May 17 '11 at 14:22
  • yes the servers (Blade servers) uses 1GBE multihomed, NIC's Can it be that the TCP stack/NIC driver relies on http://en.wikipedia.org/wiki/TCP_segmentation_offloading – ernelli May 17 '11 at 14:30

2 Answers2

3

If you are capturing packets on the server then you might see TCP sending out larger segments than the MTU. The packets on the wire , however, will be MTU size only. You can verify this by capturing on a network device (switch) etc. Alternatively capturing packets on the remote (client) machine will reveal that each packet is <= MTU .

This behaviour is due to the fact that with TSO/GSO enabled, the TCP segment is split into MTU sized packets by NIC hardware. Since tcpdump captures at software layer, it sees segments larger than the MTU being sent to the NIC card for further transfer.

If you disable tso/gso for the NIC, then you will see all outgoing packets to be <= MTU size (more likely pMTU size).

0

Q1: I dont really thik apache has any knoiwledge of what it does there. It will deal with TCP connections and leave the rest to the operating system TCP stack ;)

Q2: Fragmentation. The packet gets turned down on the way, a "send again, smaller" gets sent back, the server (not apache - this is ip stack) sends it again smaller.

Q3: it does not. Really, i dont think, again, apache deals with the tcp stack at all on a lower level, and MTU etc. is WAY lower. The TCP stack of the server is responsible for this, and if the proper settings are set (not JUST "fragmentation needed", but also a correct smaller size - The parameter you look at it TCP MSS).

Technically, this looks like either some broken equipment and / or some broken TCP implementation as the MSS parameter on the SYN packet seems to contain a larger than allowed size OR the senders computer simply ignores the MSS value.

http://en.wikipedia.org/wiki/Maximum_segment_size is a good starting reference. It seems MTU discovery fails (or the result is ignored) and then a non-standard size is used.

TomTom
  • 50,857
  • 7
  • 52
  • 134