Mysql slave stoped and fail to initialize relay log

Question

A couple of days ago I setup a slave MySQL server (Mysql Community Server 8.0.21). So far we only use it for backup using mysqldump. I did notice that he was using too much memory, but I did not mind because I used a few conf params that auto-regulate the amount of memory he would use, and the VPS was dedicated to the MySQL server (if I pay for the whole memory, why not use it, right?).

Today, first thing in the morning I lookup my Zabbix and notice that the slave was not working, so I ssh into it, restart the MySQL, and try to start the slave using START SLAVE. And this is what I got:

ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository

Not sure what caused the problem in the first place (this is my my.cnf)

[client]
#    Usado apenas para casos especificos
port                            = 3306      # Porta parada
socket                          = /var/run/mysqld/mysqld.sock

[mysql]
#    Configurações do cliente
auto-rehash                                 # Auto completar

[mysqld]
#    Configuração do servidor
pid_file                        = /var/run/mysqld/mysqld.pid
socket                          = /var/run/mysqld/mysqld.sock
datadir                         = /var/lib/mysql
log_error                       = /var/log/mysql/error.log
user                            = mysql
bind_address                    = 0.0.0.0   # Ouve todos os endereços

#    Configurações genericas do servidor
max_allowed_packet              = 32M       # Tamanho maximo do pacote.
max_connections                 = 2000      # Maximo de coneções
open_files_limit                = 10000     # Maximo de arquivos abertos
tmp_table_size                  = 64M       # Limite tamanho tabela em mem
max_heap_table_size             = 64M       # Limite tamanho tabela em mem
tmpdir                          = /tmp      # Diretorio /tmp/
default_storage_engine          = InnoDB    # Engine default
skip_name_resolve                           # Desabilita resolução DNS


#     Configurações de log binario
log_bin                         = mysql-bin # Arquivo de log binario
relay-log                       = mysql-relay-bin
log-slave-updates               = 1         # Log de update no slave
read-only                       = 1         # Apenas leitura
binlog-format                   = mixed     # Formato
server_id                       = 2         # Identifica servidor para log
max_binlog_size                 = 256M      # Tamanho maximo log binario
binlog_expire_logs_seconds      = 604800    # Max tempo log binario
innodb_flush_log_at_trx_commit  = 1         # Proteção de dados
sync_binlog                     = 1         # Somente pra replicação

#    Configurações especificas do InnoDB
innodb_dedicated_server         = ON        # Autoconf InnoDB
innodb_io_capacity              = 2000      # Quantas escritas por segundo
innodb_read_io_threads          = 64        # Threads de leitura
innodb_write_io_threads         = 64        # Threads de escrita
innodb_thread_concurrency       = 0         # Auto dectecta threads

#    Slow query log 
slow_query_log                  = 1         # Guarda queries lentas
long_query_time                 = 1.0       # Tempo query

#    Funções dentro do Mysql
log_bin_trust_function_creators = 1;        # Permite funções criadas

I have no idea how to fix it. Should I do the whole dump all over again? How can I prevent this from happen again?

UPDATE:

This is the result of the SHOW SLAVE STATUS:

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: IP_ADDR
                  Master_User: USER
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000011
          Read_Master_Log_Pos: 178887867
               Relay_Log_File: pergamum-relay-bin.000029
                Relay_Log_Pos: 178888082
        Relay_Master_Log_File: mysql-bin.000011
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 13124
                   Last_Error: Slave failed to initialize relay log info structure from the repository
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 178887867
              Relay_Log_Space: 0
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 13124
               Last_SQL_Error: Slave failed to initialize relay log info structure from the repository
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 0
                  Master_UUID: bef45e1b-99d6-11ea-a355-3e2547e4f083
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 200819 09:58:32
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set (0.00 sec)

UPDATE

As asked, here are the variables from both servers.

MASTER

GLOBAL STATUS:
https://gist.github.com/IamRichter/ef4993bbf65883baa366ff30d73b9644
GLOBAL VARIABLES:
https://gist.github.com/IamRichter/dbf08facd364548529b07bc0cbd6b2e6

SLAVE

GLOBAL STATUS:
https://gist.github.com/IamRichter/fcdff1111ed52fc859ebac46a2dbfc27
GLOBAL VARIABLES:
https://gist.github.com/IamRichter/53c5638fb3c6064fd51eab332660f07f

@RickJames I am using ```innodb_dedicated_server```, so the pool should be adjusted by the server. (https://dev.mysql.com/doc/refman/8.0/en/innodb-dedicated-server.html) — , Aug 24 '20 at 12:13
Well, Zabbix tells me that MySQL is already using 90% of all available ram. For the main server, I have 16GB and for the slave, it's 6GB. — , Aug 24 '20 at 19:03
Maybe the `STATUS` will give some clues. Please provide `SHOW GLOBAL STATUS` and `SHOW VARIABLES` for both servers, preferably after they have been up for a day. — Rick James, Aug 24 '20 at 22:39
@RickJames I just added the dumps from the commands. I had to add to a gist because my question run out of characters... — , Aug 25 '20 at 12:21

score 0 · Answer 1 · answered Aug 26 '20 at 20:00

Alas, the VARIABLES/STATUS don't show any obvious reasons for the problem you are having. Maybe the high buffer_pool_size is squeezing RAM too much, but I doubt it. Here are my analyses (Primary, then Replica):

Observations:

Version: 8.0.20
16 GB of RAM
Uptime = 2d 02:30:41
You are not running on Windows.
Running 64-bit version
You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

Increase table_open_cache if the OS will let you. Why so many tables??

I think (without any proof) that is is unwise to use anything but "2" for innodb_log_files_in_group MariaDB is already ignoring it; I have not heard about Oracle.

There are some indications of poorly fomulated queries. See http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog for analyzing the slowlog.

Did you have some reason for having binlog_format=MIXED?

Details and other observations:

( Opened_tables ) = 452,365 / 181841 = 2.5 /sec -- Frequency of opening Tables -- increase table_open_cache (now 3995)

( Table_open_cache_overflows ) = 448,306 / 181841 = 2.5 /sec -- May need to increase table_open_cache (now 3995)

( Table_open_cache_misses ) = 452,365 / 181841 = 2.5 /sec -- May need to increase table_open_cache (now 3995)

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 4). Also check for swapping.

( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5 -- innodb_page_cleaners -- Recommend setting innodb_page_cleaners (now 4) to innodb_buffer_pool_instances (now 8)

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_pages_written / Innodb_buffer_pool_write_requests ) = 2,102,149 / 7377368 = 28.5% -- Write requests that had to hit disk -- Check innodb_buffer_pool_size (now 12884901888)

( innodb_log_files_in_group ) = 9 -- Number of InnoDB log files -- 2 is probably the only reasonable value. A large number may cause performance problems.

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 1,773,668,352 / (181841 / 3600) / 9 / 1024M = 0.00363 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 181,841 / 60 * 1024M / 1773668352 = 1,834 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 1073741824). (Cannot change in AWS.)

( innodb_flush_method ) = innodb_flush_method = O_DIRECT_NO_FSYNC -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( Innodb_row_lock_time_avg ) = 1,886 -- Avg time to lock a row (millisec) -- Possibly conflicting queries; possibly table scans.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 2,000 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 2000) is too high and various memory settings are high, you could run out of RAM.

( Handler_read_rnd_next / Com_select ) = 46,219,107,293 / 7120289 = 6,491 -- Avg rows scanned per SELECT. (approx) -- Consider raising read_buffer_size (now 131072)

( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (358564 + 419704 + 462 + 0) / 2006877 = 0.388 -- Statements per Commit (assuming all InnoDB) -- Low: Might help to group queries together in transactions; High: long transactions strain various things.

( Select_scan ) = 4,386,291 / 181841 = 24 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 4,386,291 / 7120289 = 61.6% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( Aborted_clients ) = 175,398 / 181841 = 0.96 /sec -- Threads bumped due to wait_timeout -- Increase wait_timeout (now 28800); be nice, use disconnect

( Connections ) = 3,544,963 / 181841 = 19 /sec -- Connections -- Increase wait_timeout (now 28800); use pooling?

Abnormally small: Open_files = 3

Abnormally large: Com_do = 0.02 /HR Com_release_savepoint = 2.3 /HR Com_rollback_to_savepoint = 0.11 /sec Com_savepoint = 2.3 /HR Com_show_charsets = 0.44 /HR Com_show_create_db = 2.3 /HR Com_show_create_func = 2.5 /HR Com_show_create_proc = 1.5 /HR Com_show_create_trigger = 0.12 /HR Com_show_function_status = 2.5 /HR Com_show_procedure_status = 2.5 /HR Com_show_table_status = 0.11 /sec Com_show_triggers = 0.11 /sec Created_tmp_files = 20 /sec Handler_read_key = 139335 /sec Handler_read_next = 316713 /sec Handler_read_rnd = 46465 /sec Handler_savepoint = 2.3 /HR Handler_savepoint_rollback = 0.11 /sec Innodb_buffer_pool_read_requests = 1095747 /sec Innodb_num_open_files = 3,996 Innodb_system_rows_inserted = 0.97 /HR Innodb_system_rows_read = 1.99e+7 Innodb_system_rows_updated = 0.095 /sec Open_table_definitions = 2,000 Select_full_range_join = 2.1 /sec Select_full_range_join / Com_select = 5.5% back_log / max_connections = 100.0% innodb_max_dirty_pages_pct_lwm = 10 innodb_read_io_threads = 64 innodb_undo_tablespaces = 2 innodb_write_io_threads = 64 max_error_count = 1,024 max_length_for_sort_data = 4,096

Abnormal strings: default_authentication_plugin = caching_sha2_password event_scheduler = ON innodb_dedicated_server = ON innodb_fast_shutdown = 1 optimizer_trace = enabled=off,one_line=off optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN

sf1030756-M

Observations:

Version: 8.0.21
6 GB of RAM
Uptime = 5d 23:59:45
Are you sure this was a SHOW GLOBAL STATUS ? -- Or is this replica simply not busy??
You are not running on Windows.
Running 64-bit version
You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

As with the Primary -- table_open_cache, innodb_log_files_in_group, binlog_format

Check for swapping. innodb_buffer_pool_size is rather large for the 6GB of RAM.

Lower max_connections to 100 -- You have not used more than 5 so far; the current 2000 is a burden on RAM.

Details and other observations:

( table_open_cache ) = 3,995 -- Number of table descriptors to cache -- Several hundred is usually good.

( Table_open_cache_misses / (Table_open_cache_hits + Table_open_cache_misses) ) = 56,436 / (1107115 + 56436) = 4.9% -- Effectiveness of table_open_cache. -- Increase table_open_cache (now 3995) and check table_open_cache_instances (now 16).

( innodb_lru_scan_depth * innodb_page_cleaners ) = 1,024 * 4 = 4,096 -- Amount of work for page cleaners every second. -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixable by lowering lru_scan_depth: Consider 1000 / innodb_page_cleaners (now 4). Also check for swapping.

( innodb_page_cleaners / innodb_buffer_pool_instances ) = 4 / 8 = 0.5 -- innodb_page_cleaners -- Recommend setting innodb_page_cleaners (now 4) to innodb_buffer_pool_instances (now 8)

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_pages_written / Innodb_buffer_pool_write_requests ) = 25,132 / 57003 = 44.1% -- Write requests that had to hit disk -- Check innodb_buffer_pool_size (now 5368709120)

( innodb_log_files_in_group ) = 5 -- Number of InnoDB log files -- 2 is probably the only reasonable value. A large number may cause performance problems.

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 22,224,896 / (518385 / 3600) / 5 / 512M = 5.7e-5 -- Ratio -- (see minutes)

( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 518,385 / 60 * 512M / 22224896 = 208,704 -- Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf. -- (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size (now 536870912). (Cannot change in AWS.)

( innodb_flush_method ) = innodb_flush_method = O_DIRECT_NO_FSYNC -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( max_connections ) = 2,000 -- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 2000) is too high and various memory settings are high, you could run out of RAM.

( (Com_show_create_table + Com_show_fields) / Questions ) = (8605 + 17405) / 313218 = 8.3% -- Naughty framework -- spending a lot of effort rediscovering the schema. -- Complain to the 3rd party vendor.

( tmp_table_size ) = 64M -- Limit on size of MEMORY temp tables used to support a SELECT -- Decrease tmp_table_size (now 67108864) to avoid running out of RAM. Perhaps no more than 64M.

( Select_full_join / Com_select ) = 8,884 / 123363 = 7.2% -- % of selects that are indexless join -- Add suitable index(es) to tables used in JOINs.

( Select_scan / Com_select ) = 107,352 / 123363 = 87.0% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( Com_admin_commands / Queries ) = 6,755 / 320422 = 2.1% -- Percent of queries that are "admin" commands. -- What's going on?

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( log_slow_slave_statements ) = log_slow_slave_statements = OFF -- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Replica reads.

( thread_cache_size / Max_used_connections ) = 28 / 5 = 560.0% -- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.

( thread_stack * max_connections ) = (286720 * 2000) / 6144M = 8.9% -- Bare minimum memory allocation for max_connections. -- Lower max_connections (now 2000)

Abnormally small: Bytes_received = 58 /sec Com_insert = 0 Com_update = 0 Created_tmp_files = 0.035 /HR Handler_update = 2.2 /HR Innodb_buffer_pool_write_requests = 0.11 /sec Innodb_buffer_pool_write_requests / Innodb_buffer_pool_pages_flushed = 2.27 Innodb_dblwr_pages_written / Innodb_dblwr_writes = 1 Innodb_pages_created = 1 /HR Innodb_rows_deleted + Innodb_rows_inserted = 0 Innodb_rows_inserted = 0 Innodb_rows_updated = 0 Select_range = 0 Select_range / Com_select = 0

Abnormally large: Com_do = 0.014 /HR Com_show_create_func = 0.15 /HR Com_slave_start = 0.028 /HR Innodb_num_open_files = 3,996 Innodb_system_rows_read = 1.91e+7 Innodb_system_rows_updated = 2.2 /HR Open_table_definitions = 2,000 back_log / max_connections = 100.0% innodb_max_dirty_pages_pct_lwm = 10 innodb_read_io_threads = 64 innodb_undo_tablespaces = 2 innodb_write_io_threads = 64 max_error_count = 1,024 max_length_for_sort_data = 4,096 optimizer_trace_offset = --1 performance_schema_error_size = 4,772 performance_schema_max_cond_classes = 100 performance_schema_max_mutex_classes = 300 performance_schema_max_rwlock_classes = 60 performance_schema_max_stage_classes = 175 performance_schema_max_statement_classes = 218 performance_schema_max_thread_classes = 100

Abnormal strings: default_authentication_plugin = caching_sha2_password event_scheduler = ON innodb_dedicated_server = ON innodb_fast_shutdown = 1 optimizer_trace = enabled=off,one_line=off optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on read_only = ON slave_rows_search_algorithms = INDEX_SCAN,HASH_SCAN

sf1030756-S

Hey, sorry for the late reply (i was testing to see if it would work). I did much of the things you suggested, and drop all databases from the slave an imported them again. But it failed again. Same problem. Also, did you use some tools for this analysis? *UPDATE* Well, this problem was diferent, Mysql just could not log on the server, and after a manual loggin, it updated their keys and it worked. — , Sep 04 '20 at 14:00

Mysql slave stoped and fail to initialize relay log

1 Answers1