0

Interesting, strange problem.

Have a production system running an application distributed across multiple servers, which all read and write to a, separate, MySQL Server.

  • MySQL Server version 5.6.31
  • CentOS 7

After running for a few hours, the application crashes, citing:

<JDBCExceptionReporter.java:233>SQL Error: 0, SQLState: 08S01                                                        
<JDBCExceptionReporter.java:234>Communications link failure

The last packet successfully received from the server was 60,059 milliseconds ago.  The last packet sent successfully to the server was 60,059 milliseconds ago.

Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

How long the application runs for before this happens seems fairly unpredictable, sometimes it's a day, sometimes just a couple of hours. The issue will occur, causing the application to crash, on different servers at different times, until eventually all of them are down. The interesting part, however, is:

The failures, if they occur, will always be according to a predictable schedule which has a period of 61 minutes (varies somewhere between 60 minutes, 55 seconds and 61 minutes, 5 seconds). So for instance we might see failures at:

  • 16:01:30
  • 17:02:32
  • 18:03:31
  • 19:04:35
  • 21:06:33

My question is; what are all the possible causes of a failure such as this occurring every 61 minutes?

We've looked at a huge amount of stuff already and are still stumped, for me the key seems to be in this bizarre periodic pattern, which precludes scheduled tasks as it drifts by 1 minute per hour, whereas all scheduled tasks would not do this.

aola1433
  • 1
  • 1
  • Have you checked mysql logs to see if there is any clue? – Khaled Feb 02 '17 at 15:10
  • Yes, there is no clue in the mysqld logs, mysql-server itself does not seem to report any problem when this is occurring. There is also nothing unusual in the slow query logs. – aola1433 Feb 02 '17 at 15:15
  • You can try to ping the server periodically (specifically at expected failure time) and capture traffic related to MySQL. This may give you an idea about what's going on. – Khaled Feb 02 '17 at 15:37
  • Have had a ping script running continuously from all application servers to the mysql server, with no interruptions seen. Will try running a tcpdump capture on the mysql port for awhile to try and capture something useful. – aola1433 Feb 02 '17 at 16:13
  • Someone posted a question about a problem happening every 61 minutes in SuperUser today: http://superuser.com/questions/1174516/linux-why-would-something-happen-every-61-minutes. He's also on CentOS 7. – Barmar Feb 02 '17 at 22:57

0 Answers0