8

I've run into a problem on my Debian VPS (a xen domU) regarding SSL. Namely almost all SSL connections hangs at client hello. For example:

# curl -vI https://graph.facebook.com

  • About to connect() to graph.facebook.com port 443 (#0)
  • Trying 66.220.146.48... connected
  • Connected to graph.facebook.com (66.220.146.48) port 443 (#0)
  • successfully set certificate verify locations:
  • CAfile: none CApath: /etc/ssl/certs
  • SSLv3, TLS handshake, Client hello (1):

It's the same when using the openssl client. However, some of the SSL traffic works (for example https://www.nordea.se).

Server

#uname -a

Linux server.com 2.6.26-1-xen-amd64 #1 SMP Fri Mar 13 21:39:38 UTC 2009 x86_64 GNU/Linux

It does however work on my Dom 0 (the main xen host).

Apt-get

I can't even run apt-get update with the debian security sources (hangs on reading headers)

Open SSL

At the begining I thought I had an old openssl client (0.9.8o-4) since I appeared to have a newer on the Dom 0 (0.9.8g-15+lenny8) but doing a manuanl update on the openssl deb didn't help.

Open SSL Client

This is the full output of when the openssl client hangs: http://pastebin.com/PAjwMap9

Closing thoughts

I've Googled the crap out of this, and I'm not getting any further. I've seen problems with curl, apt-get etc. but they are all specific relating to the very application - not general for the system. Any thoughts?

Niklas B
  • 381
  • 1
  • 2
  • 7
  • As a follow up, I'm starting to thinks it's a Xen issue (all domU have the same issue). DomU are img based + (network-script network-bridge). They are also one another IP chain then the Dom0 via net.ipv4.ip_forward – Niklas B Feb 04 '11 at 08:03

5 Answers5

11

After some discussions back and forth with my hostingprovider it turned out that they had a MTU problem with the IP Chains that my DomU was using (but not the Dom0). I wanted to thank everyone who helped me out in the process, your help was invaluable :)

Niklas B
  • 381
  • 1
  • 2
  • 7
  • 1
    We just resolved the same type of problem today with the same solution - lowering MTU from 1500 to 1495 fixed it. Also running on Xen, we Curl over SSL was failing at the same point (after client hello). – BrianC Dec 10 '12 at 18:12
  • just make me remember to include MTU in my default checklist next time – Florenz Kley Nov 28 '14 at 00:24
  • Had a similar problem. Some SSL sites would hang after the client hello. For me, I had to reduce the mss value on an old Cisco switch. – FamiliarPie Apr 15 '15 at 08:41
2

This is old and already answered, but we suffered the same exact issue and the cause was related, but different.

The key was to sniff traffic on our edge router, where we saw ICMP messages to the server (GitHub.com) asking for fragmentation. This was messing the connection, with retransmissions, duplicated ACKs and so.

enter image description here

The ICMP packet had a field, MTU of next hop with a weird value, 1450. The usual value is 1500.

enter image description here

We checked our router and one of the interfaces (an Ethernet tunnel) had this value as MTU, so the router was taking the minumun MTU of all interfaces as next hop. As soon as we removed this interface (it was unused), the SSH handshake started to work again.

charli
  • 226
  • 1
  • 6
0

Try:

 $sudo apt-get --reinstall install openssl libssl0.9.8
alvosu
  • 8,357
  • 24
  • 22
  • I had to do a apt-get -f install first (it was complaining on (libssl-dev: Depends: libssl0.9.8 (= 0.9.8o-3) but 0.9.8o-4 is to be installed). Still the same problem though :( - now running openssl 0.9.8o-4 and libssl0.9.8 0.9.8o-4 – Niklas B Feb 04 '11 at 09:01
  • I also did a apt-get upgrade, but no luck. – Niklas B Feb 04 '11 at 09:10
0

Sounds like a problem with the guest's /dev/urandom or /dev/random .. or maybe another device. Run your hanging process under strace, and see if it is hanging trying to read.

beans
  • 1,550
  • 13
  • 16
  • I tried a random file, and also strace. It does hang trying to read, but still does so using a random file. Here is the full strace output: http://pastebin.com/AKnHJjGC – Niklas B Feb 06 '11 at 14:21
  • Not sure if this is vital, but I did a "dd if=/dev/random of=/tmp/x bs=1 count=1024" and it stalls on both DomU and Dom0. However, /dev/urandom works fine. "dd if=/dev/urandom of=/tmp/x bs=1 count=1024 1024+0 records in 1024+0 records out 1024 bytes (1.0 kB) copied, 0.00653607 s, 157 kB/s" – Niklas B Feb 06 '11 at 14:30
  • That said, using urandom doesn't help: "openssl s_client -state -connect graph.facebook.com:443 -rand /dev/urandom" still hangs – Niklas B Feb 06 '11 at 14:32
  • The strace suggests that the client hangs waiting for the server to respond to the client hello. Would you be willing to send along `strace -s0 openssl s_client -state -connect graph.facebook.com:443 -rand /dev/urandom`, the contents of your fb_ca_chain_bundle.crt, and the contents of /usr/lib/ssl/openssl.cnf? – beans Feb 07 '11 at 00:41
  • Of course I can! I just grabbed the fb_ca_chain_bundle.crt at random. Anway, strace: http://pastebin.com/AHK3ik99 fb_ca_chain_bundle.crt: http://pastebin.com/wYz7iG5W and openssl.cnf: http://pastebin.com/YUQckxcc I've also sent a mail to my hosting provider to ask if he has seen similar results before (firewall, routing issues, etc. etc.). Feels unlikely since dom0 is fine, but it never hurts to check. – Niklas B Feb 07 '11 at 06:48
  • Do you get the same results if you force ssl3 with `openssl s_client -ssl3 -state -connect graph.facebook.com:443`? This server doesn't do sslv2. This problem is not related to your /dev/u?random... something else is going on here. The software should be waiting for 5 bytes on the socket before continuing. Send along the debug with `openssl s_client -debug -state -connect graph.facebook.com:443`. – beans Feb 07 '11 at 19:46
0

I would try using openssl s_client and giving it a random file ("any" file) just to check if the problem is related to /dev/random|urandom as Ben said:

openssl s_client -state -connect graph.facebook.com:443 -rand anyfile

Be aware that using a file this way is very dangerous cryptography-wise so be sure to find another solution before pushing that in production.

Shadok
  • 623
  • 5
  • 10