9

My application needs to do quite frequent connects to a mysql database on another machine. However I get frequent errors on connection; I can usually eventually connect but after many retries. The error message I get is:

Mysql.mysql(): Couldn't connect to SQL-server: Lost connection to MySQL server at 'reading authorization packet', system error: 2

I have read the information here: http://dev.mysql.com/doc/refman/5.0/en/error-lost-connection.html and also the comments to the bug http://bugs.mysql.com/bug.php?id=28359.

There are quite a number of aborted connects as you can see:

mysql> SHOW GLOBAL STATUS LIKE 'Aborted_connects';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Aborted_connects | 2540  | 
+------------------+-------+

I increased the timeout from 5 to 15 sec:

mysql> show variables like 'connect_timeout';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| connect_timeout | 15    | 
+-----------------+-------+

but this has not helped. Any suggestions how to debug this ? It makes the database interaction horribly slow when it on average has to try to connect 5 times before it works.

I have not yet tried to keep the connection open for the lifetime of the application, would that be better ? How do I prevent the connection from closing ?

If it helps the computer with the problems is Windows 7 32-bit, while the mysql server is on Debian Linux.

Zitrax
  • 784
  • 2
  • 11
  • 21
  • I know this is only a bandage but the bug report URL does state: "I have worked-around the problem by increasing the connect_timeout global variable to 30 seconds." Do you want to try a higher value? – JoseK Oct 19 '10 at 13:01
  • 1
    Cold comfort, but this is a well-written question. Excellent following of the Eric S. Raymond article... +1 to you, and good luck. – gWaldo Oct 19 '10 at 13:31
  • @JoseK of course I can try but I doubt it will make much difference as I saw no improvement by increasing from 5 to 15. – Zitrax Oct 19 '10 at 13:33
  • Is this one of the variables that sets no limit if the value is set at 0? – pjmorse Oct 19 '10 at 14:32
  • 1
    Are you positive there's not some sort of connectivity issue between the machines, perhaps a network issue? – Warner Oct 19 '10 at 15:30
  • @Warner 100% no, but I have seen no other connectivity problems than this. – Zitrax Oct 20 '10 at 10:25
  • Obvious suggestion: try "telnet machine 3306" and see how fast that is? If THAT takes a long time, you might want to look at your system connect timeouts (which would override MySQL's). You could also create a permanent stunnel to 3306 on the other machine? –  Oct 20 '10 at 17:55
  • @barrycarter telnet connects so fast I can't even notice any delay at all – Zitrax Oct 22 '10 at 09:46
  • Have you tried sniffing the network traffic? It may show you where the delay is? –  Oct 22 '10 at 11:41
  • @barrycarter See comment to answer by Khushil – Zitrax Oct 28 '10 at 09:56
  • The plugins you have installed might have errors. I have faced a similar situation while installing an audit plugin but when I fixed the bug and reinstalled the plugin it worked again –  May 30 '13 at 06:56

2 Answers2

2

Looking over all the bug details an reports I suspect there is a network or client side issue here. Something is either corrupting or adding too much latency to the mysql data.

On a clear network, do a wireshark trace and check what the conversation is like. Ensure no traffic shaping is going on anywhere. Seems like a 64bit issue only so check a 32bit build just to ensure it's the same issue. Check on a Windows XP box - M$ did some funky stuff to the network stack on Vista which made it suck - bit time! We have no end of networking issues with Vista and Win7 machines!

If you do manage a wireshark trace please pastebin and put up the link as we'd all like a look I think.

Khushil
  • 553
  • 3
  • 11
  • What do you mean with 'seems like a 64bit issue'? This is a 32-bit install. And sorry I mentioned Vista, this is Win7, description corrected. – Zitrax Oct 27 '10 at 13:25
  • I am not very familiar in debugging with wireshark. But a few things look odd to me, there are a few redmarked RST packets and TCP Retransimission, and the time of some of some of the packets are negative, like -257.3 seconds. – Zitrax Oct 27 '10 at 13:36
  • can you use another switch at all and re-test please. different make if you have one? if you could pastebin or zip and make the wireshark dump available we could take a look... – Khushil Nov 05 '10 at 09:12
2

In the end this turned out to be due to a general networking issue on the machine. The solution was found in: Network performance issues on windows 7

Zitrax
  • 784
  • 2
  • 11
  • 21