11

I'm trying to fix MySQL configuration on our server. Our app's specifics is that a lot of data is stored in single table (currently over 300 millions of rows). This table is used often for inserts (they come all the time).

When i run a select query on that table that takes longer than few seconds then all inserts (precisely commits) are waiting for table access and makes our app unresponsive.

As far as i know InnoDB does'nt make any locks on table when select is running. Why is the select blocking table then?

I tried to find a reason with innotop but i'm not sure how to interpret it's output and where to search. Tell me what you need and i'll post it here.

+-----+---------+-----------+--------+---------+------+----------------+-----------------------------------------------------------------------------------------------------------------------------------+
| Id  | User    | Host      | db     | Command | Time | State          | Info                                                                                                                              |
+-----+---------+-----------+--------+---------+------+----------------+-----------------------------------------------------------------------------------------------------------------------------------+
|   1 | root    | localhost | dbname | Query   |   29 | NULL           | COMMIT                                                                                                                            | 
|   2 | root    | localhost | dbname | Query   |   30 | NULL           | COMMIT                                                                                                                            | 
|   4 | root    | localhost | dbname | Query   |   29 | NULL           | COMMIT                                                                                                                            | 
|   5 | root    | localhost | dbname | Query   |   29 | NULL           | COMMIT                                                                                                                            | 
|   6 | root    | localhost | dbname | Query   |   25 | NULL           | COMMIT                                                                                                                            | 
|   7 | root    | localhost | dbname | Query   |    0 | NULL           | show full processlist                                                                                                             | 
|  13 | user    | localhost | dbname | Query   |   25 | NULL           | COMMIT                                                                                                                            | 
|  38 | user    | localhost | dbname | Sleep   |    0 |                | NULL                                                                                                                              | 
|  39 | user    | localhost | dbname | Sleep   | 9017 |                | NULL                                                                                                                              | 
|  40 | user    | localhost | dbname | Query   |   33 | Sorting result | SELECT * FROM `large_table` WHERE (`large_table`.`hotspot_id` = 3000064)  ORDER BY discovered_at LIMIT 799000, 1000 | 
|  60 | user    | localhost | dbname | Sleep   | 1033 |                | NULL                                                                                                                              | 
|  83 | root    | localhost | dbname | Sleep   | 3728 |                | NULL                                                                                                                              | 
| 112 | root    | localhost | NULL   | Sleep   |    6 |                | NULL                                                                                                                              | 
+-----+---------+-----------+--------+---------+------+----------------+-----------------------------------------------------------------------------------------------------------------------------------+


=====================================
110824 12:24:24 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 19 seconds
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1521117, signal count 1471216
Mutex spin waits 0, rounds 20647617, OS waits 239914
RW-shared spins 2119697, OS waits 1037149; RW-excl spins 505734, OS waits 218177
------------
TRANSACTIONS
------------
Trx id counter 0 412917332
Purge done for trx's n:o < 0 412917135 undo n:o < 0 0
History list length 48
Total number of lock structs in row lock hash table 5
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 28363, OS thread id 1092766032
MySQL thread id 83, query id 3249941 localhost root
---TRANSACTION 0 412901582, not started, process no 28363, OS thread id 1144449360
MySQL thread id 60, query id 3677008 localhost user
---TRANSACTION 0 412917189, not started, process no 28363, OS thread id 1144314192
MySQL thread id 43, query id 3905773 localhost root
---TRANSACTION 0 412534255, not started, process no 28363, OS thread id 1092630864
MySQL thread id 39, query id 14279 localhost user
---TRANSACTION 0 412917331, not started, process no 28363, OS thread id 1144179024
MySQL thread id 38, query id 3908045 localhost user
---TRANSACTION 0 412917201, not started, process no 28363, OS thread id 1092495696
MySQL thread id 13, query id 3908257 localhost user
---TRANSACTION 0 412538821, not started, process no 28363, OS thread id 1092360528
MySQL thread id 7, query id 3908258 localhost root
show engine innodb status
---TRANSACTION 0 412917330, ACTIVE 6 sec, process no 28363, OS thread id 1144043856
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 2, query id 3907373 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917331, sees < 0 412917131
---TRANSACTION 0 412917328, ACTIVE 6 sec, process no 28363, OS thread id 1092225360
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 6, query id 3907345 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917329, sees < 0 412917131
---TRANSACTION 0 412917326, ACTIVE 6 sec, process no 28363, OS thread id 1091955024
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 4, query id 3907335 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917327, sees < 0 412917131
---TRANSACTION 0 412917324, ACTIVE 6 sec, process no 28363, OS thread id 1092090192
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 5, query id 3907328 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917325, sees < 0 412917131
---TRANSACTION 0 412917321, ACTIVE (PREPARED) 7 sec, process no 28363, OS thread id 1143908688 preparing
2 lock struct(s), heap size 368, undo log entries 1
MySQL thread id 1, query id 3907125 localhost root
COMMIT
Trx read view will not see trx with id >= 0 412917322, sees < 0 412917131
---TRANSACTION 0 412917131, ACTIVE 20 sec, process no 28363, OS thread id 1074075984, thread declared inside InnoDB 111
mysql tables in use 1, locked 0
MySQL thread id 40, query id 3904958 localhost user Sorting result
SELECT * FROM `large_table` WHERE (`large_table`.`hotspot_id` = 3000064)  ORDER BY discovered_at LIMIT 848000, 1000
Trx read view will not see trx with id >= 0 412917132, sees < 0 412917132
--------
FILE I/O
--------
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 1; buffer pool: 0
3510225 OS file reads, 284998 OS file writes, 202897 OS fsyncs
1.05 reads/s, 21299 avg bytes/read, 8.10 writes/s, 7.58 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 275, free list len 13392, seg size 13668,
489950 inserts, 491830 merged recs, 10986 merges
Hash table size 8850487, used cells 8127172, node heap has 32697 buffer(s)
71914.53 hash searches/s, 8701.91 non-hash searches/s
---
LOG
---
Log sequence number 157 3331524445
Log flushed up to   157 3331521939
Last checkpoint at  157 3326072846
1 pending log writes, 0 pending chkp writes
199025 log i/o's done, 7.53 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 4788954432; in additional pool allocated 1048576
Buffer pool size   262144
Free buffers       0
Database pages     229447
Modified db pages  1439
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 7453325, created 14887, written 118658
1.37 reads/s, 0.11 creates/s, 0.53 writes/s
Buffer pool hit rate 1000 / 1000
--------------
ROW OPERATIONS
--------------
1 queries inside InnoDB, 0 queries in queue
7 read views open inside InnoDB
Main thread process no. 28363, id 1091684688, state: flushing log
Number of rows inserted 1093064, updated 249134, deleted 1405, read 1115880534
7.89 inserts/s, 2.47 updates/s, 0.05 deletes/s, 80953.21 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

EDIT:

Thanks for clarifying this out.

So I have to split my question into two cases now.

  1. Is it normal that lock on this single table causes my whole app to 'hang'. Shouldn't DB be responsive for queries to other tables? Maybe some buffer is set too low?

  2. Will switching this table to MyISAM help? I don't need transactions on this table at all. Won't there be any other locks in situation like that (long select + many quick inserts)?

EDIT2:

That's how insert queries look like:

INSERT INTO `large_table` (`device_address`, `hotspot_id`, `minute`, `created_at`, `updated_at`, `discovered_with_hci`, `hour`, `rssi`, `day`, `device_class`, `discovered_at`) VALUES('10:40:03:90:10:40', 3000008, 1, '2011-08-22 05:01:08', '2011-08-22 05:01:08', -1, 5, -79, '2011-08-22 05:01:01', '0', '2011-08-22 05:01:01')

That's what indexes are defined on it:

+-------------+------------+----------------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table       | Non_unique | Key_name                                     | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------------+------------+----------------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| large_table |          0 | PRIMARY                                      |            1 | id                  | A         |    92396334 |     NULL | NULL   |      | BTREE      |         | 
| large_table |          1 | index_large_table_on_discovered_with_hci     |            1 | discovered_with_hci | A         |          18 |     NULL | NULL   | YES  | BTREE      |         | 
| large_table |          1 | index_large_table_on_hotspot_id              |            1 | hotspot_id          | A         |          18 |     NULL | NULL   | YES  | BTREE      |         | 
| large_table |          1 | index_large_table_on_day_and_hour_and_minute |            1 | day                 | A         |          18 |     NULL | NULL   | YES  | BTREE      |         | 
| large_table |          1 | index_large_table_on_day_and_hour_and_minute |            2 | hour                | A         |          18 |     NULL | NULL   | YES  | BTREE      |         | 
| large_table |          1 | index_large_table_on_day_and_hour_and_minute |            3 | minute              | A         |      537187 |     NULL | NULL   | YES  | BTREE      |         | 
| large_table |          1 | index_large_table_on_created_at              |            1 | created_at          | A         |     8399666 |     NULL | NULL   | YES  | BTREE      |         | 
| large_table |          1 | index_large_table_on_rssi                    |            1 | rssi                | A         |          18 |     NULL | NULL   | YES  | BTREE      |         | 
+-------------+------------+----------------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+

EDIT 3:

Why during such queries whole my application is not responding? Shouldn't it affect only that 'large_table'?

Maybe something is wrong with my mysql config? Server is a 4 core Xeon 2GHz with 16GB RAM. It runs MySQL + Rails App

My config params:

skip-external-locking
key_buffer              = 64M
max_allowed_packet      = 16M
thread_stack            = 128K
thread_cache_size       = 8
query_cache_size        = 32M
tmp_table_size          = 64M
max_heap_table_size     = 64M
table_cache             = 256
read_rnd_buffer_size    = 512K
sort_buffer_size        = 2M

myisam-recover          = BACKUP
max_connections         = 200

query_cache_limit       = 1M

long_query_time = 200

max_binlog_size         = 100M

innodb_buffer_pool_size = 4G
safe-updates
max_join_size=100000000

Mysqltuner script suggests only:

long_query_time (<= 10)
innodb_buffer_pool_size (>= 62G)
kaczor1984
  • 241
  • 1
  • 3
  • 11
  • Please append the output of `show engine innodb status;`. – quanta Aug 24 '11 at 09:46
  • In innotop you can hit L to get a view into the locks. How does your app establish a connection? JDBC? What defaultTransactionIsolation level are you using? – HTTP500 Aug 24 '11 at 15:41
  • It's a RoR application so I guess it's default for MySQL (can I check it somehow by SQL query?). innotop doesn't show any locks while this select runs. – kaczor1984 Aug 24 '11 at 16:26
  • @kaczor1984 You can check the default isolation level by doing a: show variables like 'tx_isolation'; query. Default is REPEATABLE READ. Note that MVCC works only with REPEATABLE READ and READ COMMITTED. Not sure what the solution to your problem is but RolandoMySQLDBAs answer was informative. – HTTP500 Aug 24 '11 at 20:02
  • Updated my answer with an analysis of the query at Process ID 40 – RolandoMySQLDBA Aug 25 '11 at 12:11

2 Answers2

15

Please look carefully at the processlist and the 'show engine innodb status'. What do you see ???

Process IDs 1,2,4,5,6,13 are all trying to run COMMIT.

Who is holding up everything ??? Process ID 40 is running a query against large_table.

Process ID 40 has been running for 33 seconds. Process IDs 1,2,4,5,6,13 having been running less than 33 seconds. Process ID 40 is processing something. What's the hold up ???

First of all, the query is pounding on large_table's clustered index via MVCC.

Within Process IDs 1,2,4,5,6,13 are rows that have MVCC Data protecting its transaction isolation. Process ID 40 has a query that is marching through rows of data. If there is an index on the field hotspot_id, that key + the key to the actual row from the clustered index must perform an internal lock. (Note: By design, all non-unique indexes in InnoDB carry both your key (the column you meant to index) + a clustered index key). This unique scenario is essentially Unstoppable Force meets Immovable Object.

In essence, the COMMITs must wait until it is safe to apply changes against large_table. Your situation is not unique, not a one-off, not a rare phenomenon.

I actually answered three questions like this in the DBA StackExchange. The questions were submitted by the same person related to the same one problem. My answers were not the solution but helped the question submitter come to his own conclusion on how to handle his situation.

In addition to those answers, I answered another person's question about deadlocks in InnoDB with regard to SELECTs.

I hope my past posts on this subject help clarify what was happening to you.

UPDATE 2011-08-25 08:10 EDT

Here is the query from Process ID 40

SELECT * FROM `large_table`
WHERE (`large_table`.`hotspot_id` = 3000064)
ORDER BY discovered_at LIMIT 799000, 1000;

Two observations:

  • You are doing 'SELECT *' do you need to fetch every column ? If you need only specific columns, you should label them because the temp table of 1000 rows could be larger than you really need.

  • The WHERE and ORDER BY clauses usually give away performance issues or make table design shine. You need to create a mechanism that will speed up the gather of keys before gathering data.

In light of these two observations, there are two major changes you must make:

MAJOR CHANGE #1 : Refactor the query

Redesign the query so that

  1. keys are gathered from the index
  2. only 1000 or them are collect
  3. joined back to the main table

Here is the new query which does these three things

SELECT large_table.* FROM
large_table INNER JOIN
(
    SELECT hotspot_id,discovered_at
    FROM large_table
    WHERE hotspot_id = 3000064
    ORDER BY discovered_at
    LIMIT 799000,1000
) large_table_keys
USING (hotspot_id,discovered_at);

The subquery large_table_keys gathers the 1000 keys you need. The result from the subquery is then INNER JOINed to large_table. So far, the keys are retrieved instead of whole rows. That's still 799,000 rows to read through. There is a better way to get those keys, which leads us to...

MAJOR CHANGE #2 : Create Indexes that Support the Refactored Query

Since the refactored query only features one subquery, you only need to make one index. Here is that index:

ALTER TABLE large_table ADD INDEX hotspot_discovered_ndx (hotspot_id,discovered_at);

Why this particular index ? Look at the WHERE clause. The hotspot_id is a static value. This makes all hotspot_ids form a sequential list in the index. Now, look at the ORDER BY clause. The discovered_at column is probably a DATETIME or TIMESTAMP field.

The natural order this presents in the index is as follows:

  • Index features a list of hostpot_ids
  • Each hotspot_id has an ordered list of discovered_at fields

Making this index also eliminates doing internal sorting of temp tables.

Please put these two major changes in place and you will see a difference in running time.

Give it a Try !!!

UPDATE 2011-08-25 08:15 EDT

I looked at your indexes. You still need to create the index I suggested.

RolandoMySQLDBA
  • 16,364
  • 3
  • 47
  • 80
  • Thanks for a huge explanation how it works. I'm afraid I can't figure out how to avoid situations like that. Inserts have to modify indexes and select has to use index on hotspot_id and discovered_at. I'd be glad if you could also anwser to my 'idea' of switching to MyISAM. – kaczor1984 Aug 25 '11 at 11:03
  • Actually, using MyISAM could make things worse because each INSERT, UPDATE, and DELETE in MyISAM triggers a full table lock. Even if you use LOW_PRIORITY INSERTs or INSERT DELAYED, full table locks will still be encountered. You should explore the query itself because regardless of storage engine, queries can be tuned around such obstacles. At the very least, a new algorithm altogether may be required. I'll look at the query in a couple of minutes... – RolandoMySQLDBA Aug 25 '11 at 11:34
  • I've updated my first post so you could see insert queries and indexes on this table. – kaczor1984 Aug 25 '11 at 11:55
  • Updated my answer with an analysis of the query at Process ID 40 – RolandoMySQLDBA Aug 25 '11 at 12:11
  • 4
    you sir are a hero among men for your long mysql answers – Mike Aug 26 '11 at 02:16
3

Solved!

The main problem was query_cache. http://bugs.mysql.com/bug.php?id=21074

After disabling it the 'freezes' dissapeared.

kaczor1984
  • 241
  • 1
  • 3
  • 11