I have a webapp (Tomcat/Hibernate/DBCP 1.4) that runs queries against MySQL, and this works fine for a certain load, say 50 queries a second. When I route the same moderate load through HAProxy (still just using a single database), I get a failure, maybe one for every 500 queries. My app reports:
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 196,898 milliseconds ago. The last packet sent successfully to the server was 0 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor210.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
...
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3017)
...
Meanwhile the HAProxy log is showing a lot of entries like:
27] mysql mysql/db03 0/0/34605 2364382 cD 3/3/3/3/0 0/0
Oct 15 15:43:12 localhost haproxy[3141]: 127.0.0.1:35500 [15/Oct/2012:15:42:50.0
The "cD" apparently indicates a state of client timeout. So whereas my webapp is saying that HAProxy is refusing to accepting new connections, HAProxy is saying that my webapp is not accepting data back.
I am not including my HAProxy configuration, because I've tried many different parameter values, with essentially the same result. In particular, I've set maxconn to both high and low values, in both global and server sections, and what always happens in the stats is that the max sessions rises to no more than about 7. My JDBC pool size is also high.
Is it generally ok to use a JDBC pool and a HAProxy pool together? Have people run into this kind of problem before?
I have an idea on how to solve this, which is to send a "validation query" before every query. But there's a certain overhead there, and I'd still like to know why my webapp succeeds when it goes straight to MySQL, but gets dropped connections on going through HAProxy.
How can I debug further and get more information than just "cD"? I tried running HAProxy in debug mode, but it doesn't seem to reveal anything more.
UPDATE - Fri Jan 4 11:49:28 ICT 2013 (Reply to JimB)
The only way to get more info from haproxy than what you have, would be to use the
show sess
orshow sess <id>
command periodically to watch the state of each tcp connection
Here are some information about the sessions:
0x31f4310: proto=tcpv4 src=192.168.3.40:60401 fe=FE_mysql be=BE_mysql srv=mysql3 ts=08 age=1m2s calls=2 rq[f=909202h,l=0,an=00h,rx=13s,wx=,ax=] rp[f=109202h,l=0,an=00h,rx=13s,wx=,ax=] s0=[7,18h,fd=0,ex=] s1=[7,18h,fd=1,ex=] exp=13s
0x31fca50: proto=tcpv4 src=192.168.3.40:60423 fe=FE_mysql be=BE_mysql srv=mysql1 ts=08 age=2s calls=2 rq[f=909202h,l=0,an=00h,rx=1m13s,wx=,ax=] rp[f=109202h,l=0,an=00h,rx=1m13s,wx=,ax=] s0=[7,18h,fd=9,ex=] s1=[7,18h,fd=12,ex=] exp=1m13s
Haproxy has a default timeout of 10 seconds (and the example configs have 50 seconds I think). I'm not too familiar with JDBC, but going from Tomcat's docs, there is a setting
minEvictableIdleTimeMillis
, which will evict idle connection from the pool, and defaults to 60 seconds, and may be up to 65 seconds because thetimeBetweenEvictionRunsMillis
is 5 seconds by default. Basically, you need to make sure your haproxy timeout is high enough to account for these idle connections in the pool.
I have increased the timeout client
to 75 seconds and now the above error seems occur less than before:
2013-01-04 11:59:59 Debug: Communications link failure
The last packet successfully received from the server was 145,255 milliseconds ago. The last packet sent successfully to the serv er was 10 milliseconds ago.
I also want to note that: besides the above, there're some errors like this:
Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
On the server side, sometime I see the sD
flag at disconnection:
haproxy[15770]: 192.168.3.40:56944 [04/Jan/2013:11:06:55.895] FE_mysql BE_mysql/mysql1 0/0/77153 1954480 sD 1/1/1/1/0 0/0
The timeout server
also is set to 75 seconds.
Another approach would be to use
testWhileIdle
andvalildationQuery
to keep the connections active, since a few packets of traffic every few seconds would probably alleviate the issue as well.
I would suggest the developer trying these options if there is no other way.