7

I have a webapp (Tomcat/Hibernate/DBCP 1.4) that runs queries against MySQL, and this works fine for a certain load, say 50 queries a second. When I route the same moderate load through HAProxy (still just using a single database), I get a failure, maybe one for every 500 queries. My app reports:

Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 196,898 milliseconds ago.    The last packet sent successfully to the server was 0 milliseconds ago.
at sun.reflect.GeneratedConstructorAccessor210.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1117)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3567)
...
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3017)
...

Meanwhile the HAProxy log is showing a lot of entries like:

27] mysql mysql/db03 0/0/34605 2364382 cD 3/3/3/3/0 0/0
Oct 15 15:43:12 localhost haproxy[3141]: 127.0.0.1:35500 [15/Oct/2012:15:42:50.0

The "cD" apparently indicates a state of client timeout. So whereas my webapp is saying that HAProxy is refusing to accepting new connections, HAProxy is saying that my webapp is not accepting data back.

I am not including my HAProxy configuration, because I've tried many different parameter values, with essentially the same result. In particular, I've set maxconn to both high and low values, in both global and server sections, and what always happens in the stats is that the max sessions rises to no more than about 7. My JDBC pool size is also high.

Is it generally ok to use a JDBC pool and a HAProxy pool together? Have people run into this kind of problem before?

I have an idea on how to solve this, which is to send a "validation query" before every query. But there's a certain overhead there, and I'd still like to know why my webapp succeeds when it goes straight to MySQL, but gets dropped connections on going through HAProxy.

How can I debug further and get more information than just "cD"? I tried running HAProxy in debug mode, but it doesn't seem to reveal anything more.


UPDATE - Fri Jan 4 11:49:28 ICT 2013 (Reply to JimB)

The only way to get more info from haproxy than what you have, would be to use the show sess or show sess <id> command periodically to watch the state of each tcp connection

Here are some information about the sessions:

0x31f4310: proto=tcpv4 src=192.168.3.40:60401 fe=FE_mysql be=BE_mysql srv=mysql3 ts=08 age=1m2s calls=2 rq[f=909202h,l=0,an=00h,rx=13s,wx=,ax=] rp[f=109202h,l=0,an=00h,rx=13s,wx=,ax=] s0=[7,18h,fd=0,ex=] s1=[7,18h,fd=1,ex=] exp=13s

0x31fca50: proto=tcpv4 src=192.168.3.40:60423 fe=FE_mysql be=BE_mysql srv=mysql1 ts=08 age=2s calls=2 rq[f=909202h,l=0,an=00h,rx=1m13s,wx=,ax=] rp[f=109202h,l=0,an=00h,rx=1m13s,wx=,ax=] s0=[7,18h,fd=9,ex=] s1=[7,18h,fd=12,ex=] exp=1m13s

Haproxy has a default timeout of 10 seconds (and the example configs have 50 seconds I think). I'm not too familiar with JDBC, but going from Tomcat's docs, there is a setting minEvictableIdleTimeMillis, which will evict idle connection from the pool, and defaults to 60 seconds, and may be up to 65 seconds because the timeBetweenEvictionRunsMillis is 5 seconds by default. Basically, you need to make sure your haproxy timeout is high enough to account for these idle connections in the pool.

I have increased the timeout client to 75 seconds and now the above error seems occur less than before:

2013-01-04 11:59:59 Debug: Communications link failure

The last packet successfully received from the server was 145,255 milliseconds ago. The last packet sent successfully to the serv er was 10 milliseconds ago.

I also want to note that: besides the above, there're some errors like this:

Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.

On the server side, sometime I see the sD flag at disconnection:

haproxy[15770]: 192.168.3.40:56944 [04/Jan/2013:11:06:55.895] FE_mysql BE_mysql/mysql1 0/0/77153 1954480 sD 1/1/1/1/0 0/0

The timeout server also is set to 75 seconds.

Another approach would be to use testWhileIdle and valildationQuery to keep the connections active, since a few packets of traffic every few seconds would probably alleviate the issue as well.

I would suggest the developer trying these options if there is no other way.

quanta
  • 50,327
  • 19
  • 152
  • 213
guckmal
  • 171
  • 1
  • 2
  • At a more mundane level, it might be worth running ethtool -S eth0 (or whatever) to check it is not dropping packets. – ramruma Jan 04 '13 at 07:03
  • No dropped packets. `ethtool -S eth1 | grep drop` returns zero. I'm going to sniff some packets to see what happens. – quanta Jan 04 '13 at 07:34

1 Answers1

7

The only way to get more info from haproxy than what you have, would be to use the show sess or show sess <id> command periodically to watch the state of each tcp connection, though I'm not sure if you would get any more useful information.

The cD termination state is the most helpful piece of info you have. What it means exactly is that an established connection with the client was timed out. This is controlled in haproxy via the timeout client parameter in the config, set globally, or in a frontent or listen section.

You said that you don't see concurrent connections go over 7, and this log entry show that the failure happened when there were only 3 connections, so I doubt you have a connection limit problem (even outside of haproxy's control).

So what it looks like is happening, is that occasionally the pool adds a new connection, which handles some queries, and then sits idle. When that connection sits idle longer than the timeout client setting in haproxy, haproxy is going to terminate the connection itself. Next time that connection is used from the pool, you get the above error.

Haproxy has a default timeout of 10 seconds (and the example configs have 50 seconds I think). I'm not too familiar with JDBC, but going from Tomcat's docs, there is a setting minEvictableIdleTimeMillis, which will evict idle connection from the pool, and defaults to 60 seconds, and may be up to 65 seconds because the timeBetweenEvictionRunsMillis is 5 seconds by default. Basically, you need to make sure your haproxy timeout is high enough to account for these idle connections in the pool.

Another approach would be to use testWhileIdle and valildationQuery to keep the connections active, since a few packets of traffic every few seconds would probably alleviate the issue as well.

[edit] In response to @quanta's additional information:

Even though the haproxy timeout is now 75sec, your are definitely still getting session timeouts. There may be some addition play in the total lifetime of a JDBC connection that I'm not aware of. Since there are very few connections needed for this type of service, there is also nothing wrong with increasing the timeouts to something extremely high, on the order of an hour or more. If the JDBC pool really is having problems releasing old connections, this would only be masking the issue, but it could be an easy fix as well.

JimB
  • 1,924
  • 12
  • 15