Diagnosing Mysql Replication Issues

Question

We have a mysql replication client running on our backup server. Since a power failure last week it's stopped replicating. Before this it was running uninterrupted for several months.

I've tried restarting both the master and the slave but this hasn't helped. I can access the master server from the slave, so the network isn't the issue.

Is there anything else I can do to try diagnose what the problem is?

mysql> show slave status\G;
*************************** 1. row ***************************
             Slave_IO_State:
                Master_Host: master
                Master_User: username
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000060
        Read_Master_Log_Pos: 46277494
             Relay_Log_File: mysqld-relay-bin.000348
              Relay_Log_Pos: 98
      Relay_Master_Log_File: mysql-bin.000060
           Slave_IO_Running: No
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 46277494
            Relay_Log_Space: 98
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

ERROR:
No query specified


mysql> show master status\G;
*************************** 1. row ***************************
            File: mysql-bin.000069
        Position: 851796
    Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

ERROR:
No query specified

Update: The errors were going into daemon.log, not mysql.err, which would explain why I couldn't find them. The problem seems to be that the master is saying the log is unavailable, which doesn't make much sense, because that log (and the previous one) are still available on the master.

090710  9:17:35 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000060' at position 46277494, relay log './mysqld-relay-bin.000350' position: 98
090710  9:17:35 [Note] Slave I/O thread: connected to master 'username@master:3306',  replication started in log 'mysql-bin.000060' at position 46277494
090710  9:17:35 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)
090710  9:17:35 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log
090710  9:17:35 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000060', position 46277494

Just FYI- you don't need to use a semi-colon when terminating with \G. That's why you have the extraneous "No query specified" errors in your output. — Dan Carley, Jul 10 '09 at 09:35

score 7 · Accepted Answer · answered Jul 10 '09 at 08:30

7

Welcome to the wonderful world of MySQL replication. I haven't hit your particular issue myself, but I've hit a lot of other weird problems and the proximate solution is to just resync from the master as though it's a brand new slave and be done with it.

answered Jul 10 '09 at 08:30

womble

95,029
29
173
228

Yep, that's what I've always done in the past... thought I'd see if I could actually figure out the problem this time though ;) – theotherreceive Jul 10 '09 at 16:34
6

Life's too short to worry about exactly *why* MySQL has decided to ruin your weekend this particular time. Just kick it and get back to the movie. – womble Jul 11 '09 at 02:06
Yep, in the end I had to give in and do this. – theotherreceive Jul 12 '09 at 22:05
I did hit this issue in the past. After way too much pondering and almost illegal amount of coffee I also just decided to kick it and get back to the movie. If one ends up to Hell and becames a DBA there, he's probably gonna be a MySQL Replication Solution Specialist. That's how much MySQL replication can suck WHEN it decides to kick to your nuts. – Janne Pikkarainen Aug 02 '10 at 09:39

score 2 · Answer 2 · answered Aug 02 '10 at 09:25

As womble has mentioned, forget about troubleshooting replication errors. The thing that worries me most about this approach is that you might succeed in getting replication to restart again and think everything is fine, but what if some parts of your db are still out of sync?

Best is to nuke the slave database and restart replication from a snapshot of the master. It should not be as disruptive as you might think:

http://www.neotitans.com/resources/mysql/quick-replication-error-recovery-via-snapshots.html

score 2 · Answer 3 · answered Jul 10 '09 at 03:27

You should examine the slave's error log - it's usually quite explicit about what the problem is.

You should have the mysql error logs tied into your monitoring system, otherwise your slaves are potentially worthless.

Moreover, you should have a monitor which checks the slave status.

And in order to be any use at all, you'll also want to check the sync of the slaves from time to time, perhaps by using something like mk-table-checksum; ideally tie the results of that into your monitoring system as well.

There's nothing logged, unless I'm missing a setting to turn on greater logging. — theotherreceive, Jul 10 '09 at 03:30

score 2 · Answer 4 · answered Jul 10 '09 at 03:56

2

Many people set skip-slave-start so that they can make sure everything is okay if a slave stops replicating before starting it up. Try running 'start slave' and see if anything changes or if something gets logged. Additionally it's strange that the SlaveSQL process is running and the SlaveIO is not. It's possible the local relay logs on the slave has been corrupted though that should be reported in the logs. You might try bring Mysql down and then deleting the relay logs.

answered Jul 10 '09 at 03:56

kashani

3,922
18
18

I tried 'stop slave' and 'start slave', should probably have made that clearer in my question. I tried your suggestion about cleaing the relay logs, but it refused to start slave the slave again after that. It still doesn't seem to be logging any errors though. – theotherreceive Jul 10 '09 at 04:12
hmmm I'm starting to run out of ideas. Couple of other simple things to look at. Make sure your disk isn't full. Make sure ./mysql/ is owned by mysql:mysql (or whatever it should be on your system). Check mysqld.err and not just mysql.log if you haven't yet. Most of those are simple, but should eliminate any general weirdness. – kashani Jul 10 '09 at 04:50
Take real care with wiping the relay logs. If it's been out of sync with the master for a long time then there is a chance that the master may no longer hold the binary logs that relate to the deleted relay info. Double check first. – Dan Carley Jul 10 '09 at 09:36

score 1 · Answer 5 · edited Sep 17 '13 at 06:44

From the above report i found the issue, this fieled must be set to (Slave_IO_Running): yes, but in above report it's showing Slave_IO_Running: No.

Thats causing the issue, If this variable reads ‘No’, then the IO thread was caused to stop. so there is no replication any more. You will have to check the Last_SQL_Errno and Last_SQL_Err for more information on the cause. An error number of 0 and message of the empty string mean “no error.” The Last_SQL_Error appears in the slave’s error log.

To fix this issue, stop the slave

Then set:

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;

This tells the slave to skip one query (which is the invalid one that caused the replication to stop). If you'd like to skip two queries, you'd use SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 2; instead and so on.

Then restart the slave and check the logs, Hoping this will fix the issue...

Diagnosing Mysql Replication Issues

5 Answers5