3

I'm curious to hear strategies and methods people use for performing alters on very large tables in MySQL. Large could be any number of rows or size that would be impacting to alter. For the sake of conversation, let's say 2 million+ rows to which any alter is going to be impacting to normal application performance.

The two primary strategies I see are to either perform the alter on a slave and then promote it to be the master after it has finished, or to create a copy of the table with the intended alters already done then copy and catch up the data and do a rename to swap them out before dropping the old.

I am ideally seeking a means to do the latter. A large concern of mine here is triggers on the table being altered, and of course ensuring the data in the two tables is kept in sync before they are swapped. I am thinking the potential for error or missing data can be mitigated to some extent by making use of the read_only variable at key points in the process to make sure data is not changing when fiddling with triggers and after you have caught up all the data. I understand that would have an impact on the application using the database, but it's better than risking corrupted data.

I have been looking at utilities and strategies for doing this and there are several out there. A notable one is this one, which Facebook used as the basis for their online alters. : openark kit documentation. The process is then elaborated upon here: Thoughts and ideas for Online Schema Change

What are your experiences with either method? What pitfalls and gotcha's did you come across? Which do you prefer/suggest and why?

Percona/Maatkit also have their own: pt-online-schema-change

sinping
  • 2,055
  • 14
  • 12

2 Answers2

2

I have done online schema changes of tables in excess of 100 million rows using the trigger method and it works really well.

  1. Create new table with desired structure.
  2. Add trigger to the old table to copy inserted data to the new table. It would look something like:

    DELIMITER |
    CREATE TRIGGER original_to_new AFTER INSERT ON original_table
      FOR EACH ROW BEGIN
    INSERT INTO new_table SET col1 = NEW.col1, col2 = NEW.col2...
    END;
    |
    DELIMITER ;
    
  3. Take note of the first auto-increment id moved to the new table with the trigger.

  4. Copy data from the old table to the new up to the ID captured in step 3.
  5. Swap table names. RENAME original_table TO original_table_backup, new_table TO original_table
  6. Drop the old table
sreimer
  • 2,168
  • 14
  • 17
  • So you just did the whole process manually? Any particular reason? All of the utilities out there pretty much do the same thing. I'd be inclined to do it manually as well just to have more control over the process. – sinping Oct 03 '11 at 17:16
  • The only thing I occasionally will script is the copy of data in chunks to keep the load down. The rest is pretty straight forward. – sreimer Oct 03 '11 at 17:18
  • Your solution is quite wrong, I'm afraid. Disclaimer: I am author of the openark-kit discussed above. The online alter table operation is MUCH more complicated than presented in your answer. What happens for UPDATEs? for DELETEs? It seems you assume you never delete or purge rows from your table. How do you copy 100 million rows? In one single transaction? You also assume an AUTO_INCREMENT PRIMARY KEY. The general solutions assume nothing of the above, and handle the general case where the PK is of any type and number of columns, and where rows are deleted and updated. – Shlomi Noach Sep 09 '12 at 17:14
0

The Continuent folks include this ability in their Tungsten Enterprise product. For example see: https://s3.amazonaws.com/releases.continuent.com/doc/tungsten-1.3.3/html/Tungsten-Concepts-And-Administration-Guide/content/ch03s19.html

HTTP500
  • 4,827
  • 4
  • 22
  • 31