9

My MySQL slave is spending a lot of time in Slave_SQL_Running_State: System lock. I can see that the system is currently I/O write bound, and that it is processing the log, although slowly. Show processlist doesn't show anything other than "Waiting for master to send event" and "System lock" when it is in this state.

All my tables (other than the system tables) are InnoDB, and external locking is disabled. What is the slave doing in this state?

Here's some info that has been requested:

First, this is MySQL 5.6 community on an Amazon EC2 instance, with all storage on EBS.

mysql> show processlist;
+----+-------------+-----------+---------------+---------+--------+----------------------------------+------------------+
| Id | User        | Host      | db            | Command | Time   | State                            | Info             |
+----+-------------+-----------+---------------+---------+--------+----------------------------------+------------------+
|  1 | system user |           | NULL          | Connect |  26115 | Waiting for master to send event | NULL             |
|  2 | system user |           | NULL          | Connect | 402264 | System lock                      | NULL             |
| 14 | readonly    | localhost | theshadestore | Query   |      0 | init                             | show processlist |
+----+-------------+-----------+---------------+---------+--------+----------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 184.106.16.14
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin-log.000764
          Read_Master_Log_Pos: 505452667
               Relay_Log_File: relay-log.000197
                Relay_Log_Pos: 345413863
        Relay_Master_Log_File: bin-log.000746
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 345413702
              Relay_Log_Space: 19834085375
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 402263
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 307009
                  Master_UUID: b1bf9a19-dac0-11e2-8ffa-b8ca3a5bce90
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)
Greg
  • 1,353
  • 1
  • 12
  • 13
  • 1
    Anything going on with your storage? If it's local disk, are you getting any SMART warnings, or is it in a degraded RAID array possibly? – nedm Nov 04 '13 at 21:28
  • It's an EC2 EBS volume. There are no errors in dmesg. – Greg Nov 04 '13 at 22:01
  • 1
    note that this simply might be a bug of 5.6, consider checking against another version (e.g. 5.5): http://forums.mysql.com/read.php?22,598354,598354 – the-wabbit Nov 04 '13 at 22:38
  • Please provide few relevant entries from `mysqld.log` when replication broke at the first time _AND_ post output from following: mysql> SHOW SLAVE STATUS\G; mysql> SHOW FULL PROCESSLIST; – alexus Nov 04 '13 at 21:34
  • As far as I can tell, replication isn't broken, it's just slow. Seconds_Behind_Master is decreasing over time. I'm just curious about what is happening when Slave_SQL_Running_State = System lock – Greg Nov 04 '13 at 22:02
  • perhaps you should double check replication user privileges (just to make sure), you can also look into logs and find out what happened before as well. – alexus Nov 05 '13 at 15:38
  • The privileges look fine. Again though: it is processing the log, it's just slow, and I'm wondering what it's doing when it says that it is in "System lock" – Greg Nov 05 '13 at 21:08
  • 1
    Here's the definition of System Lock State. Looks like it could be related to your system being I/O write bound. System lock - The thread is going to request or is waiting for an internal or external system lock for the table. For SHOW PROFILE, this state means the thread is requesting the lock (not waiting for it). from: http://dev.mysql.com/doc/refman/5.6/en/general-thread-states.html – jbrahy Mar 23 '15 at 23:40
  • Possible these:
    ... https://bugs.mysql.com/bug.php?id=72131
    ... http://dba.stackexchange.com/questions/80970/mysql-slave-stuck-at-system-lock-most-of-the-time-high-cpu-usage
    – Dr. Tyrell Apr 08 '15 at 05:44

2 Answers2

2

Databases running on distributed storage facepalm. I would benchmark the filesystem running on top of the EC2 EBS storage system. Probably the simplest method is to use something like s=$(date +%s); dd if=/dev/zero of=<database-dir> bs=1M count=512; e=$(date +%s); echo "scale=4; 512 / ( $e - $s )" | bc. That assumes you have 512 MB to spare. Now, the problem with this benchmarking is that (1) it doesn't take into account caching effects, and (2) the resolution is not very fine. But if this test is slow, then the problem is definitely with EC2 EBS. If the test is fast or nominal, we have to dig further and use more sophisticated techniques.

The bonnie++ program is somewhat adequate, but it doesn't (AFAIK) flush the OS buffers between the write and read. Still, you should get an idea with something like bonnie++ -u mysql -r 8 -s 16:512 -n 1 -b -d <mysql-data-directory>. When I do this on a VM running on local storage, I get:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
test        16M:512  1173  99 +++++ +++ +++++ +++  3187  99 +++++ +++ +++++ +++
Latency              1553us      23us     330us     750us     173us    6372us
Version  1.96       ------Sequential Create------ --------Random Create--------
test                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1  1850  20 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency             27428us      24us    1188us   30258us      36us    1107us

Here is what I get when running on a VM over NFS:

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
test        16M:512  1273  98 +++++ +++ +++++ +++  3053  99 +++++ +++ +++++ +++
Latency              1372us      28us     380us     832us      19us    9473us
Version  1.96       ------Sequential Create------ --------Random Create--------
test                -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                  1   754  11 +++++ +++   728   8   751  12 +++++ +++   791   8
Latency             12695us      47us    5306us    3710us      30us    3278us
Otheus
  • 432
  • 3
  • 12
0

Is your slave EC2 instance a similar size to the master in this case?

If you are running on a smaller instance to save money you may be running into a bottle neck there. The seconds behind is several days. Was replication offline for a long while or did this grow over time during some kind of data input spike?

zaznet
  • 41
  • 1
  • The slave is definitely slower. I'm wondering if there's a way to know what query the slave is working on, just like how 'show processlist' on the master shows which queries are running. – Greg Mar 24 '15 at 21:00
  • 1
    It's a log replay. You can see where the slave and master are at in the output previously provided. Read_Master_Log_Pos: 505452667 Relay_Log_Pos: 345413863 – zaznet May 05 '15 at 15:15