Let me state up-front that I know just enough about networking at this level to be dangerous, so if I say something stupid, please be kind.

I am using a Big IP load balancer in front of 3 Apache servers. The 3 Apache servers are all on the same physical machine (running Linux), bound to port 80 on 3 different virtual IP addresses, but I don't think that's playing a part in this problem. We have the LB configured so that the same client gets the same WEB server each time, controlled via a cookie provided by the LB.

If I go through the LB to access our application, I will occasionally (5-15% of the time) get a page where the browser just spins, never returning. If I hit each of the WEB servers directly, I never get that. Using WireShark I looked at what was happening on my PC, the LB and the WEB server I was hitting through the LB, and I saw the following:

1) Most times, the port that the PC used to hit the LB was the port the LB used to hit the WEB server, which I thought was "normal"; all 3 times I've replicate my issue, the ports were different (e.g. 1234 between PC/LB and 2345 between LB/WEB). But not all of the times that they were different did I see an issue. This may be a red herring....

2) The PC <-> LB communications are with packets that are 1260 bytes long; the LB -> WEB packets are 1260 bytes, but the LB <- WEB packets are some multiple of 1260. When the WEB sends 2520 bytes to the LB, the LB receives it in 2x1260 byte packets, and only sends one ACK. It then sends back to the PC 2x1260 byte packets, and receives one ACK from the PC. I don't understand why the 2520 is broken into 2x1260, nor do I understand why the PC knows to only send one ACK. This may also be a red herring....

3) At some point the WEB server sends data to the LB for which it receives no ACK, then resends the packet that started this data sequence. I don't know why the re-transmission happens, though, as only 0.6 seconds have passed, so it can't be that it's timed out (well, I suppose it can be, but it does seem unlikely). Further, the original WEB -> LB packet was 5040 bytes (4x1260), but the re-transmission is only 1260. The LB received all 4 packets from the original 5040 sending, but never sent an ACK for whatever reason. I see other times when the LB has been able to handle the 5040 bytes being sent to it, that it acknowledged just fine, so it doesn't seem as if the length of the sent packet is an issue. None of these packets were sent from the LB back to the PC, though.

4) Only the first 1260 chunk of data is resent from the WEB to the LB, though. There is a small delay (0.4 seconds) and then there is a second re-transmission of data, starting with the first 1260 chunk, bu this time all 4x1260 packets are resent. The ACK sent back by the LB, however, is for all 5 of the resent packets, even though one of them was clearly resent twice (I base this on the relative seq number shown by WireShark; the ACK sent back is equal to the seq number of the first resent packet + 5x1260, even though the seq number of the second resent packet is the same as the first resent packet). This seems really bad.

5) Worst of all, I think, is that the ports get all messed up. The original conversation ports are as follows:

PC <-> LB == 2723
LB <-> WEB == 2722

Co-mingled among that conversation is one looking like this:

PC <-> LB == 2722
LB <-> WEB == 2723

It was enough to make me cross-eyed, but I did verify that the conversations were happening on the ports I thought they were, based on content of the packets (ugh). The second conversation completes successfully (in fact, there are two conversation on this set of ports, both of which complete successfully). After the first (aborted?) re-transmission packet is sent, but before the second set of 4x1260 re-transmissions are sent, there is a conversation:

PC <-> LB == 2722
LB <-> WEB == 2722

So, the original LB <-> WEB conversation on port 2722 is interrupted. I didn't think this was allowed under the protocol. There is no FIN or SYN sent in either direction on this port. The sequence number sent from LB -> WEB for this newly initiated conversation is what I expect it to be. The ACK from the WEB to the LB has the right sequence number, but the ACK is off (by exactly 1260).

This second conversation never seems to take place, as after the 4x1260 are re-transmitted, the WEB sends another, new, 1260 chunk of data, and all of it is the response from the original request (determined by examining the data in the packets). The LB then sends an ACK back, and it is what I would expect from the original conversation (last ACK + 4x1260 + 1260).

So, I have clues as to what's happening, but no idea why it would happen, and certainly no clues as to how to fix it. Where do I go from here?

EDIT: I just remembered something; during the middle of the 4x1260 resend from the WEB to the LB, the LB sends back an ACK that is exactly 100 bytes more than I would expect. After the first 1260 packet of new data is sent, though, the LB sends back an ACK that is exactly what it should be (i.e. it corrected itself). I have no clue how that could happen or what that means, or even if it's meaningful.

  • 36,995
  • 5
  • 52
  • 95
Joe Casadonte
  • 328
  • 3
  • 16
  • In the end we switched from the hardware LB to Pound (http://www.apsis.ch/pound/) and we've never had a problem since. Pound itself is really good for our use; my only complaint is that you have to restart the service to reload the config file. – Joe Casadonte Mar 13 '10 at 13:05
  • F5 support can help you with this and it should be covered by your support contract. – James Shewey Mar 01 '15 at 18:22

2 Answers2


One thing that sticks out at me is that your web servers are on the same machine using different IP addresses. You're not really gaining anything with load balancing if you go and put all the servers on one box.

Some of what you describe with the packets seems normal; have you used the 'Expert Info' and 'Follow TCP Stream' features of Wireshark? Those will show higher-level views of the TCP connections. Also keep in mind that Wireshark might not be capturing all packets (especially if you're on gigabit ethernet).

As for the problem, it's possible you're seeing a bug in the LB, exposed by the fact that all your web services are on the same box, which means that your three IP addresses all have the same MAC address. If the LB tracks connections using the MAC address, it could be confusing your three server IP addresses and directing packets to the wrong place.

  • 1,240
  • 9
  • 4
  • Also, those packet sizes do seem a bit small. Normally, I expect packet sizes in the 1400's. Double-check the MTU settings on all your boxes (normal MTU for ethernet is 1500) – Heath Oct 02 '09 at 18:08
  • It's a test environment, that's why all 3 WEB servers are on the same box. I'll look into the Follow TCP Stream option -- thanks! – Joe Casadonte Oct 02 '09 at 18:09
  1. The tcp port will change as necessary on the client, and on the server side of the BIG-IP, which is now the client to the web server. This is expected behavior as the BIG-IP handles a completely separate TCP connection to the client and to the server.

  2. Because the proxy handles separate connections between the client and the server, the normal TCP behaviors will act differently on the clientside and serverside of the connection. What TCP profile you apply and to which side of the proxy (you can apply different profiles to client and server side of the connection or the same to both, up to you depending on your use cases and network environments) will alter what data is actually on the wire at what time and how many and how frequently acknowledgements are sent.

  3. retransmits are common, and how the tcp stack handles network congestion depends on the stack in place, both on client and server side of the connection. Again, you have two clients and two servers for any client->BIG-IP->server scenario.

  4. would need to see the entire capture to diagnose the behavior you saw.

  5. don't think the tcp port is a concern here.

Without details, my guess is that you had stock tcp profile enabled instead of a custom profile and it behaved badly with your application. I highly recommend utilizing custom profiles and engaging someone with knowledge of how TCP flow and congestion control algorithms work, and what settings in the TCP profile act poorly with certain applications. We wrote a 10-part series on the BIG-IP TCP profile several years ago (could use a refresh with all the changes since but it's still a good source of information) that I think will help: https://devcentral.f5.com/articles/investigating-the-ltm-tcp-profile-nagles-algorithm

Jason Rahm
  • 396
  • 1
  • 6