0

Setup:

Galera Cluster with 3 nodes running in Docker Containers. In front of every node is a maxscale proxy. Remotely is a replication slave running with bin-log replication where the cluster is acting as the master (behind the maxscale proxy). Every node in the cluster has bin-log enabled and log_slave_updates=ON. GTID is not configured on the slave.

Problem

Somehow DB2 (see below) has another bin-log file number. .000016 instead of .000015 as the other two servers. Interestingly they have the same position, so I assume the have the same content, as they have the same creation time and exactly same file size (though md5sum shows different checksums).

At some point the slave got out of sync with the following logs:

2022-06-08  0:27:20 9 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'data-master-bin.000015' at position 3374796
2022-06-08  0:28:20 9 [ERROR] Slave I/O: error reconnecting to master 'user@url.to.server:some_port' - retry-time: 60  maximum-retries: 86400  message: Can't connect to MySQL server on 'url.to.server' (110 "Connection timed out"), Internal MariaDB error code: 2003
2022-06-08  0:29:20 9 [Note] Slave: connected to master 'user@url.to.server:some_port',replication resumed in log 'data-master-bin.000015' at position 3374796
2022-06-08  0:29:20 9 [ERROR] Error reading packet from server: binlog truncated in the middle of event; consider out of disk space on master; the first event 'data-master-bin.000015' at 3374796, the last event read from 'data-master-bin.000015' at 3374796, the last byte read from 'data-master-bin.000015' at 3374815. (server_errno=1236)
2022-06-08  0:29:20 9 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'data-master-bin.000015' at 3374796, the last event read from 'data-master-bin.000015' at 3374796, the last byte read from 'data-master-bin.000015' at 3374815.', Internal MariaDB error code: 1236
2022-06-08  0:29:20 9 [Note] Slave I/O thread exiting, read up to log 'data-master-bin.000015', position 3374796

I can't really say if there was no connection at some point. My assumption is, the maxscale proxy usually redirects all incoming requests to the first server as there is not much load at all on the dbs. However at this specific time the request got redirected to db-02 which caused the error as there data-master-bin.000015 is different from the other two servers. I was able to run START SLAVE on the slave and the replication is now up and running again, but I fear it will break as soon as maxscale redirects again to db-02.

So my questions are:

  • What triggers an increase of the bin-log file index?
  • How can I bring the bin-log files back in sync on the 3 cluster nodes?

Configs:

db-server .cnf on every node:

[mysqld]
log_bin
server_id=1
log_slave_updates=ON
log-basename=data-master
#Configure GTID
wsrep_gtid_mode=ON
wsrep_gtid_domain_id=1

proxy .cnf on every server

[Read-Only-Service]
type=service
router=readconnroute
servers=db-01, db-02, db-03
user=maxscale
password=******
router_options=synced

[Replication-Listener]
type=listener
service=Read-Only-Service
protocol=MariaDBClient
port=/*some-port*/
ssl=required
ssl_cert=/path/to/server-cert.pem
ssl_key=/path/to/server-key.pem
ssl_ca_cert=/path/to/ca-cert.pem

DB1

MariaDB [(none)]> SHOW MASTER STATUS;
+------------------------+----------+--------------+------------------+
| File                   | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------------+----------+--------------+------------------+
| data-master-bin.000015 |  3534812 |              |                  |
+------------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

ls -alc --fu /var/lib/mysql    
...
       413 2021-12-02 10:06:08.187023308 +0000 data-master-bin.000009
       387 2021-12-02 10:09:23.020291420 +0000 data-master-bin.000010
   8883991 2022-03-02 16:53:11.689571461 +0000 data-master-bin.000011
   1260112 2022-03-10 13:10:08.730056029 +0000 data-master-bin.000012
       466 2022-03-10 13:10:29.106313267 +0000 data-master-bin.000013
   5967037 2022-04-27 16:41:46.021414396 +0000 data-master-bin.000014
   3672324 2022-06-13 22:03:54.919102767 +0000 data-master-bin.000015
       175 2022-04-27 16:42:46.150317967 +0000 data-master-bin.index
...

DB2

MariaDB [(none)]> show master status;
+------------------------+----------+--------------+------------------+
| File                   | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------------+----------+--------------+------------------+
| data-master-bin.000016 |  3534812 |              |                  |
+------------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

ls -alc --fu /var/lib/mysql    
...
       413 2022-03-10 13:10:34.036661850 +0000 data-master-bin.000014
   5966988 2022-04-27 16:44:37.871084681 +0000 data-master-bin.000015
   3672324 2022-06-13 22:03:54.935182267 +0000 data-master-bin.000016
        75 2022-04-27 16:45:21.228903025 +0000 data-master-bin.index
...

DB3

MariaDB [(none)]> SHOW Master status;
+------------------------+----------+--------------+------------------+
| File                   | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------------+----------+--------------+------------------+
| data-master-bin.000015 |  3534812 |              |                  |
+------------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

ls -alc --fu /var/lib/mysql
...
       413 2022-03-10 13:10:12.637403075 +0000 data-master-bin.000013
   5966988 2022-04-27 16:46:33.269169792 +0000 data-master-bin.000014
   3672324 2022-06-13 22:03:54.929455002 +0000 data-master-bin.000015
        75 2022-04-28 00:37:55.597345308 +0000 data-master-bin.index
...

I'm using Maria DB 10.4

1 Answers1

0

I mitigated the issue by setting the server to only db-01. With this, the bin log is only the same. I'm sure there should be better way to handle this issue.

[Read-Only-Service]
type=service
router=readconnroute
servers=db-01
user=maxscale
password=******
router_options=synced

[Replication-Listener]
type=listener
service=Read-Only-Service
protocol=MariaDBClient
port=/*some-port*/
ssl=required
ssl_cert=/path/to/server-cert.pem
ssl_key=/path/to/server-key.pem
ssl_ca_cert=/path/to/ca-cert.pem