Interesting, strange problem.
Have a production system running an application distributed across multiple servers, which all read and write to a, separate, MySQL Server.
- MySQL Server version 5.6.31
- CentOS 7
After running for a few hours, the application crashes, citing:
<JDBCExceptionReporter.java:233>SQL Error: 0, SQLState: 08S01
<JDBCExceptionReporter.java:234>Communications link failure
The last packet successfully received from the server was 60,059 milliseconds ago. The last packet sent successfully to the server was 60,059 milliseconds ago.
Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
How long the application runs for before this happens seems fairly unpredictable, sometimes it's a day, sometimes just a couple of hours. The issue will occur, causing the application to crash, on different servers at different times, until eventually all of them are down. The interesting part, however, is:
The failures, if they occur, will always be according to a predictable schedule which has a period of 61 minutes (varies somewhere between 60 minutes, 55 seconds and 61 minutes, 5 seconds). So for instance we might see failures at:
- 16:01:30
- 17:02:32
- 18:03:31
- 19:04:35
- 21:06:33
My question is; what are all the possible causes of a failure such as this occurring every 61 minutes?
We've looked at a huge amount of stuff already and are still stumped, for me the key seems to be in this bizarre periodic pattern, which precludes scheduled tasks as it drifts by 1 minute per hour, whereas all scheduled tasks would not do this.