0

I have a Master-Master Replication with the following Configuration. Due to some patches, we have to Reboot the both the masters. But when i check the replication i s broken.

MASTER 1:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
symbolic-links=0

bind-address=Master 1
port=3306
query_cache_size=16M
query_cache_limit=4M
tmp_table_size=64M
max_heap_table_size=64M
key_buffer_size=32M
table_open_cache=32
innodb_file_per_table=1
max_connections=1000
server_id=2
binlog_format='MIXED'
log-bin=/var/log/mariadb/mariadb-bin.log
max_allowed_packet=1000M
general_log=1
auto_increment_increment=2
auto_increment_offset=2



[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

!includedir /etc/my.cnf.d

MASTER2:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
bind-address=Master2
port=3306
query_cache_size=16M
query_cache_limit=4M
tmp_table_size=64M
max_heap_table_size=64M
key_buffer_size=32M
table_open_cache=32
innodb_file_per_table=1
#innodb_force_recovery=5
max_connections=1000
server_id=1
binlog_format='MIXED'
log-bin=/var/log/mariadb/mariadb-bin.log
max_allowed_packet=1000M
general_log=1
auto_increment_increment=2
auto_increment_offset=1

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

!includedir /etc/my.cnf.d

While configuring, i researched a lot and found if we set increment values, the replication resumes after the replication crashes, but in my case i see the replication break with the following error.

ERROR:

190428 3:21:02 [ERROR] Slave SQL: Could not execute Update_rows event on table nagios.nagios_servicestatus; Can't find record in 'nagios_servicestatus', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.000002, end_log_pos 259533668, Error_code: 1032

190428 3:21:02 [Warning] Slave: Can't find record in 'nagios_servicestatus' Error_code: 1032

190428 3:21:02 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000002' position 259532745

190429 9:31:32 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)

190429 9:31:32 [Note] Slave I/O thread killed while reading event

190429 9:31:32 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000002', position 369070380

190429 9:31:33 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000002' at position 259532745, relay log './mariadb-relay-bin.000002' position: 143476361

190429 9:31:33 [Note] 'SQL_SLAVE_SKIP_COUNTER=1' executed at relay_log_file='./mariadb-relay-bin.000002', relay_log_pos='143476361', master_log_name='mariadb-bin.000002', master_log_pos='259532745' and new position at relay_log_file='./mariadb-relay-bin.000002', relay_log_pos='143477368', master_log_name='mariadb-bin.000002', master_log_pos='259533752'

190429 9:31:33 [ERROR] Slave SQL: Could not execute Update_rows event on table nagios.nagios_customvariablestatus; Can't find record in 'nagios_customvariablestatus', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.000002, end_log_pos 259534029, Error_code: 1032

190429 9:31:33 [Warning] Slave: Can't find record in 'nagios_customvariablestatus' Error_code: 1032

190429 9:31:33 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.000002' position 259533752

Do i need to add any more configuration to my.cnf to auto-recover replication after reboot?

Thanks,

1 Answers1

0

I think secondary master was not synced and apps hitting to secondary master to alter the record, please check the primary master if that is healthy and point apps connections to that. If you are using haproxy that may be more easy for you. I think best option is setup secondary master again.

asktyagi
  • 2,401
  • 1
  • 5
  • 19