Recover a crashed MySQL master server from the slave

Question

we are building a simple master/slave MySQL configuration using asynchronous replication, with MySQL enterprise 5.5.17 on both servers and innoDB based tables. In case of a crash of the master server we would like to offer to our users the possibility of recovering the master using the most-up-to date database contents of the slave. On the master server, the database and the binary logs are stored in different disk devices, for better reliability.

What would be the best way to do this? I was trying to outline a procedure for this, but I'm not sure this is correct:

Ensure that all the statements contained in the relay log of the slave are executed. Ideally, I could execute a STOP SLAVE IO_THREAD, even if it shouldn't be necessary as the master has supposingly crashed and no other statements are coming to the slave, and wait for the remaining relay events to be completed.
Shut down the database on the slave and copy over the files to the database files to the master.
From the relay-log.info and the master.info on the slave I should be able to find out what is the latest binary log on the master from which the slave was reading and at which position.
I could replay the binary logs on the master from the last statement executed by the slave before the master crashed up to the last statement available in the logs.
I should reset the SLAVE and restart the replication from the last statement executed by the slave before the master crashed.

Is this OK?

RolandoMySQLDBA · Answer 1 · 2012-01-17T21:10:08.253

In ordet to simplify the promotion of the Slave to a Master, I have a suggestion to keep the Slave closer sync'd to the Master.

You should use the tool mk-slave-prefetch.

Here is the beauty of this tool: On a slave, it will read the relay logs, look for all queries that have a WHERE clause, convert it to a SELECT and execute it. That way, the caches for InnoDB and MyISAM are essentially the same on the Slave as it is on the Master. The differences should be minor.

Here is something else you will need: Have the Master and the Slave setup using Circular Replication. That way, both Master and Slave have binary logging enabled.

How does this help? When the Master crashes and you failover to the Slave, the file master.info on the Master will contain the last place the Slave executed its SQL. Here is how you find out:

For this example we will have Circular Replication between M1 and M2

Run this commmand on M2 (your current Master) : SHOW SLAVE STATUS\G

You should see something like this:

mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.64.100.253
                Master_User: replicant
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000834
        Read_Master_Log_Pos: 823413571
             Relay_Log_File: relay-bin.002505
              Relay_Log_Pos: 823391419
      Relay_Master_Log_File: mysql-bin.000834
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 823391282
            Relay_Log_Space: 823413708
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 1
1 row in set (0.00 sec)

Please take note of two fields:

Relay_Master_Log_File
Exec_Master_Log_Pos

These two fields represent the last log file and log position from a Master that was successfully execute on the Slave.

Suppose M1 crashed, you failover to M2, and you bring up M1. Your goal is to reestablish Circular Replication. With a crash, there is the possibility of replication losing its place. Here is what to do:

Step01) Run SHOW SLAVE STATUS\G on M1

Step02) Get Relay_Master_Log_File and Exec_Master_Log_Pos from Step01. For this exmaple, let Relay_Master_Log_File be mysql-bin.000834 and Exec_Master_Log_Pos be 823391282.

Step03) Run these commands

STOP SLAVE;
CHANGE MASTER TO master_log_file='mysql-bin.000834',master_log_pos=823391282;
START SLAVE;
SELECT SLEEP(5);
SHOW SLAVE STATUS\G

Here is what to look for:

If Seconds_Behind_Master is Numeric, Replication is Running. Just wait until it reaches zero.
If Seconds_Behind_Master is NULL, Replication is Broken
- If Slave_IO_Running=Yes and Slave_SQL_Running=No, then an SQL error broke replication. Proceed to fix the error
- If Slave_IO_Running=No and Slave_SQL_Running=Yes, then the log file or log position does not exist.

You can fix that with

STOP SLAVE;
CHANGE MASTER TO master_log_file='mysql-bin.000835',master_log_pos=<new pos>;
START SLAVE;
SELECT SLEEP(5);
SHOW SLAVE STATUS\G

What is newpos?

For MySQL 5.5, newpos is 107
For MySQL 5.1, newpos is 106
For MySQL 5.0, newpos is 98

If you implement Circular Replication and properly script these things using these principles, you will achieve the recovery of Master and Slave.

Recover a crashed MySQL master server from the slave

1 Answers1