Why does MySQL replication break on missing row, then continue after "START SLAVE"?

Question

The title is a bit confusing, but I can't think of a better one.

What I have is a simple vanilla MySQL replication, with the slave occasionally failing, with this error: Error 'Can't find record in 'my_tbl'' on query. Default database: 'my_db'. Query: 'UPDATE my_tbl SET ... WHERE ...' (columns omitted for clarity).

What I'm assuming this error means, is that the slave sql thread executed this update, and received 0 rows affected. This was not what it expected when comparing the result of 1 rows affected from the relay log, thus generating an error.

When running this same update transaction manually, it works. Same thing when running START SLAVE - it just starts working, and goes back to normal.

This doesn't make sense to me at all - if all it takes is a "retry" to fix this, how could this happen in the first place? Everything is executed in a serialized fashion, and nothing else is writing to the slave mysql server.

Can someone provide an explanation?

Some technicalities - this is a mixed replication setup from 5.5.7-rc to 5.5.12.

score 2 · Answer 1 · answered May 31 '11 at 10:35

2

There is a filed MySQL bug #60091 regarding the replication of InnoDB tables that may meet your conditions - take a look at it, check if your version is affected and update it eventually to check if it helps matters.

Another explanation for this would be out-of-order execution - when the UPDATE my_tbl SET ... WHERE ... is run, the WHERE condition can not yet be met by any row since it has still to happen. I can't think of a reason for that though - this would be something to ask about on MySQL mailing lists.

answered May 31 '11 at 10:35

the-wabbit

40,319
13
105
169

Out of order execution would fit, but it doesn't make sense since it's a slave - and the only changes are made by the slave sql thread. What else could I check? – shlomoid May 31 '11 at 10:43
Also, the bug in question is fixed in 5.5.12, which is the version I'm running. Also, the table in question has a PK. – shlomoid Jun 20 '11 at 12:59
Hm, are you using some non-deterministic functions in your WHERE clause by any chance? And are you mixing MyISAM and InnoDB tables in your database? – the-wabbit Jun 22 '11 at 21:22
Neither... Regular key lookup, and only InnoDB tables (single table is accessed in the query) – shlomoid Jun 26 '11 at 07:15

score 2 · Answer 2 · answered Jun 29 '11 at 13:33

2

I've discovered the reason behind this problem - an event which was running on the master and on the slave as well. The solution is simple - alter event event_name disable on slave; Something to keep in mind when creating a slave with mysqldump.

answered Jun 29 '11 at 13:33

shlomoid

289
3
14

Why does MySQL replication break on missing row, then continue after "START SLAVE"?

2 Answers2