I am analyzing traffic between a client and a Linux webserver running on a HP blade server, the client sometimes gets stuck waiting for more data when the webserver has closed the connection.
The webserver runs apache2 that for some reason choses to run HTTP/1.1 with connection-close rather than allow the client to send multiple requests on the same connection and close the connection, as standard HTTP/1.1 (Thats another story... But it leaves the server with several thousands of TIME_WAIT sockets instead of pushing that state to the client)...
Anyway, sometimes a HTTP requests gets broken, still dont know where it actually breaks. On the serverside everything looks fine except that the client starts sending a lot of RST packets, inbetween the acks.
I have tcpdump captures from the webserver and from the NAT which the client goes through, I would suspect the NAT if it wasnt for a very strange behaviour on the webserver.
When the webserver serves the HTTP GET request, the first outgoing packet is 2960 bytes in IP payload, 2974 on wire. That is very strange since on the client end in the NAT the client receives two 1514 byte packets with 1460 bytes TCP payload.
The next and forthcoming packets that leaves the interface on the webserver uses a payload of 1460 (1514 on wire) which is within the MTU.
I believe that some magic is done in the (Cisco) SLB which sits between the webserver and the network, so the first DF packet of 2960 gets squeezed through the SLB and magically gets split in the SLB by some advanced L3 interception.
Q1) Why would the apache webserver/tcp stack even try to push a 2960 bytes packet on an interface which has MTU set to 1500?
Q2) How do it get trough the net arriving to the client as two packets?
Q3) How do the webserver know that the MTU should be decreased to 1460 even though no ICMP arrives with "Fragmentation needed" set, since all packets has the DF bit set.
Don't ask me why I ask these questions, I'm just a guy in a large large organistion trying to understand why things somtimes doesnt work.
I have some interesting tcpdump logs which I can post if needed, I just have to replace public IP addesses and such...