I have an application that is distributing data from New York to Tokyo over TCP running Solaris 10. Mean throughput is < 1Mbps, peak throughput can reach 20-30Mbps for seconds at a time though typical spikes are more like 10Mbps. Individual message sizes are small (~300bytes) and consistency of latency is key. This means we are attempting to remove batching aka so nagles is off & application is configured to send asap rather than queue then send.
The RTT between New York and Tokyo is ~180ms and the TCP window is tuned to a theoretical throughput in the region of ~40Mbps, aka 1M tcp_xmit_hiwat/tcp_rcv_hiwat. tcp_max_buf and tcp_cwnd_max are also 1M.
The problem here is that we frequently but intermittently see mysterious "pauses" where the sender gets EWOULDBLOCK leading to a buildup in an internal queue and then subsequent discharge of data. There are 2 problems here
- no obvious reason for the blocking socket, we don't appear to be hitting peak throughput and nothing in the packet captures suggest any slowdown
- during the "discharge period" (i.e. when the sender socket is no longer blocking but it has a buffer of data to send), we see a steadily increasing sawtooth pattern to the message rates
The former is the key to the problem, if I can work this out then the latter shouldn't occur. However the latter is odd, I was naively expecting it to quickly ramp to peak throughput and stay there until it had got through the backlog.
CPU utilisation is not a problem at either end, SAs say boxes look good. Network congestion on the WAN link is also not a problem, networks say network looks good. In fact everyone says every individual piece looks fine, it's just still performing badly!
Any thoughts on how to optimise for this situation? or things to investigate that might provide a hint as to what is going on?