Can DTrace help find cause for tcp connection reset on Solaris 10 x86?

Question

I'm running performance tests on a web application hosted on a Glassfish cluster.

Each cluster instance is hosted on a separate Solaris 10 zone and the http traffic is load balanced between the instances by a F5 BigIp load balancer. The problem I'm facing is that the SOAP requests periodically get aborted by tcp connection resets.

Now I need to figure out why the connections are closed and if there is anything I can do to prevent this. I've used tcpdump to monitor the traffic between the load generator and the load balancer and I can see that the tcp connections are established and that the SOAP request is sent and then the loadbalancer sends an ACK and 4-5 seconds later I get the RST and ACK flags in a tcp frame from the load balancer.

I can however not monitor the traffic between the load balancer and the cluster instances so I can't see what happens on the cluster. This is because tcpdump can't listen to the virtual network interfaces in the zones, at least I haven't found out how to do it.

So I hope there is a way to use DTrace to monitor what's going on in the cluster instances when the connections are reset, I'm guessing some resource run out, like a tcp connection queue (? Not sure about the terminology ?)

Do you have any working example of a dtrace script that show why the connections are reset?

I've looked at https://blogs.oracle.com/hkchu/entry/diagnose_networking_problems_on_solaris but the Dtrace script provided on that page does not compile on my Solaris server.

You might want to switch to Solaris 11 where the loopback interfaces can be snooped and where the dtrace scripts you mention might work. — jlliagre, Feb 21 '13 at 12:40
Unfortunately I'm stuck with Solaris 10, I need to test on the same OS as we use in production. — Ola Mattsson, Feb 21 '13 at 16:43

Can DTrace help find cause for tcp connection reset on Solaris 10 x86?

0 Answers0