2

So I have this machine running an XMPP server which connects to its MySQL database. When I start load-testing it, all goes fine up until after about 500s when I notice:

Caused by: com.mysql.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request

starting to appear. As they do, the command:

mysql> show processlist;

reveals how MySQL is receiving the next queries, answers them (state = sending data was caught on tape), then just idly sits in state = Sleep for 20s, 40s, or even more. In all this time the PreparedStatement.execute() method call does not return. The top command reveals no more than 800% CPU usage (16 cores - half of them are used at peak).

I have checked /var/log/messages and a few other places in /var/log for errors on the XMPP server machine without any clue given away. I also tried another JDBC connector, another MySQL server, updated JRE, without improvement. Where should I be looking further?

What next ? Thanks!

EDIT: Checked some more and this is not related to neither the number of threads pumping & pushing queries from their queues, nor to queries / second, nor to a specific machine, nor to the kind of SQL queries being run or number of connections to MySQL. I have also inspected a tcpdump capture and the server is answering back to queries in some milliseconds. The rest of the time... It is just the client somehow not fetching that result in TCP user space.

But in all the tests there is a metric that remains unchanged - when it hits a little past 30k simultaneously connected users (users of XMPP servers) - that is when the trouble starts.

Eric Platon
  • 367
  • 2
  • 14
kellogs
  • 69
  • 8
  • What is you testing methodology ? What tools are you using for the load testing ? Operating system ? – drcelus Mar 22 '13 at 10:10
  • You say "30k simultaneously connected users (users of XMPP servers)". To how many actual simultaneous MySQL-connections does that lead? Also, what is the CPU- and IO-load on the MySQL-server? How many queries? Are these slow queries? – Alexander Janssen Mar 22 '13 at 10:15
  • @AlexanderJanssen - the number of simultaneous MySQL connections is independent of simulateous users XMPP users connected. I have varied it from 1 to 5 to 10, with CPU usages of < 100% (as low as a few percentage points) out of 1600%. As detailed in the post, they are not slow queries. IO load - don't know, 0.x % – kellogs Mar 22 '13 at 10:21
  • @drcelus - methodology ...? What should I describe here ? I am load-testing an XMPP server through the use of tsung framework. OS is linux as in tags – kellogs Mar 22 '13 at 10:26
  • 2
    @kellogs It's your or open source XMPP server? I ask because ~30k is near max short value (32767) and it's worth to check if someone not use short to count/index XMPP clients or something like this. It's possible that is is not MySQL problem at all. – dsznajder Mar 22 '13 at 21:22
  • @dsznajder - yes, XMPP server is OSS, and we are talking about my changes from past few months that have created this mess. I remember testing it an year back and 180k simultaneous connections were no peroblem. – kellogs Mar 23 '13 at 06:34
  • What changed since your 180k successful tests? Also, have you tried to connect to MySQL with another client, while it seems blocked for the XMPP server? Overall, it is hard to give advice without more concrete detail (which XMPP OSS, which version of DB, etc). Last, does the 500s you mention is also a recurring cut time? – Eric Platon Mar 23 '13 at 10:53
  • @EricPlaton - Lots have changed, the server is not the same any longer. It was an http://www.tigase.org/ server at base. MySQL was benchmarked and had no issues. Other MySQL server which used to work just fine also presents the exactly same behaviour. They are 5.1.x both. 500s is a recurring timeframe too. – kellogs Mar 25 '13 at 14:28
  • Pretty hard to give meaningful advice with many moving parts. @dsznajder may have pointed out something important: If a library in your stack has been updated to a version that introduces a hard-coded limit, that's it. It seems the client code is the culprit. Any update on the client library? Any chance when rolling back to its previous version? As for MySQL, I had issues with 5.5, so not applicable here (we had to move back to 5.1). – Eric Platon Mar 26 '13 at 01:59
  • @EricPlaton - client library - if you are thinking about Connector/J then I have tried mulitple versions of it without any change in results. As for any other client side code that might be in some libraries, there is none that I did upgrades to. There are no meaningful libraries to this server, all the relevant stuff is contained in its own jars. – kellogs Mar 26 '13 at 07:04
  • Hmmm, this is getting a long list of comments. No tentative answer suggests that it is hard to answer the question as it is phrased and detailed right now. Details are missing. Just look at a few questions suggestions in the comments that are still unanswered. I do not know if I can help, but please get in touch by email with one/some of us to try to work out a solution. Wish you can make progress soon, and share the acquired knowledge. – Eric Platon Mar 27 '13 at 01:59

0 Answers0