1

We are having an issue with all of our Solaris VMs. The issue we're seeing is frequent timeouts when when connecting via SSH or HTTP. It only seems to affect initial connections... when I connect via SSH it will hang and timeout before I even get the login prompt, however if I CTRL C it and try again it connects just fine.

I logged into the VMware console and ran the snoop command on the Solaris VM to get a packet capture when this occurred. Here is the capture.

My computer is 10.0.0.3 and I removed the hostname of the Solaris VM I'm connecting to.

Based on the packet capture, it looks like the VM sees my first SYN packet, but does not reply, prompting my computer to resend it. It then decides to reply with an ACK packet, which I believe should've been a SYN ACK packet. Then it looks like it sends a SYN ACK packet.

Does anyone know why this is happening? Our Cisco ASA firewall waits 30 seconds and then tears down the connection because of the SYN timeout.

Thanks in advance for any help.

Derek

Derek Ivey
  • 33
  • 3
  • When I am able to successfully connect via SSH, the packet capture looks like [this](http://pastebin.com/1F1tuU8Q). Also... we're running Solaris 10 9/10 patch level 142910-17. – Derek Ivey Jul 24 '11 at 07:07
  • This sounds similar to the issue I'm seeing in our packet capture: http://wesunsolve.net/bugid/id/6942436 – Derek Ivey Jul 24 '11 at 07:13
  • Does this VM have multiple vCPU's or a single vCPU? – gm3dmo Jul 24 '11 at 10:02
  • Single vCPU. These VMs use very little CPU. – Derek Ivey Jul 24 '11 at 15:25
  • I believe I'm experiencing [this](http://wesunsolve.net/bugid/id/6942436) bug. I'm going to see if installing the latest patch cluster helps. Unfortunately I have to wait until we get our permissions fixed with My Oracle Support... grr. – Derek Ivey Jul 24 '11 at 15:27

2 Answers2

1

The latest patch cluster seems to have resolved our issue. The issue is documented here, and was fixed in patch 144489-05.

Thanks for your help.

Derek Ivey
  • 33
  • 3
0

Your capture lacks information about what else is going on in the network. My first thought whenever I see inexplicable delays in TCP connections is "DNS lookup". This goes double when you're using RFC1918 addresses, because there usually isn't even an rDNS server anywhere to say "bugger off", so the lookups timeout.

My bet is that your VM has misconfigured DNS, and the delay you're seeing is the SSH daemon on the VM going "just who are you?" and waiting for the result. The ^C-retry-success sequence is probably just enough time for the VM to realise it's not going to get an answer and letting you through the second time.

I'll bet a complete capture of traffic to/from/around the VM would show DNS packets going nowhere interesting. Recheck your DNS resolution on the VM, and you'll probably find something useful.

womble
  • 95,029
  • 29
  • 173
  • 228
  • I just removed the grep and the only other thing I see when this happened last was the router sending an NTP request to this VM (this VM also is our primary NTP server). I don't think this is related to our issue though. I did not see any DNS requests occurring in the capture when the issue occurred, however I did see other DNS requests before and after (this VM acts as a DNS server too). I have LookupClientHostnames set to no in the sshd_config file, so I don't think DNS is the issue. – Derek Ivey Jul 24 '11 at 06:58
  • No TCP wrappers or anything else getting in the way? – womble Jul 24 '11 at 07:24
  • TCP wrappers are enabled, but I don't see anything that would indicate that they're getting in the way. I'll try disabling them and see if it helps. – Derek Ivey Jul 24 '11 at 15:24