19

I periodically need to make changes to tables in mysql 5.1, mostly adding columns. Very simple with the alter table command. But my tables have up to 40 million rows now and they are growing fast... So those alter table commands take several hours. In a couple months they'll take days I'm guessing.

Since I'm using amazon RDS, I can't have slave servers to play with and then promote to master. So my question is if there's a way to do this with minimal downtime? I don't mind an operation taking hours or even days if users can still use the db of course... Can they at least read while columns are being added? What happens if my app tries to write? Insert or update? If it fails immediately that's actually not so bad, if it just hangs and causes problems for the db server that's a big problem..

This must be a fairly common scaling issue, everyone needs to add columns.. What's typically done to a production db? Slave -> master migration?

Update - I forgot to mention I'm using the innodb storage engine

apptree
  • 345
  • 1
  • 3
  • 10
  • 1
    In case someone is still looking for an answer.. http://blog.staginginstance.com/using-percona-with-aws-rds-editing-big-db-tables/ ^^ – Coder anonymous Aug 14 '17 at 19:35

5 Answers5

12

I just had to do this recently. What Amazon recommended was using the Percona Toolkit. I downloaded it and was able to run something like:

./pt-online-schema-change h=databasenameHostName,D=databasename,t=tablename --recursion-method=none --execute --user username --password password --alter "MODIFY someColumn newDataType"

and it works great. It tells you how much time remaining in the process.

It actually creates a new table with the new column and then copies the existing data over. Further, it creates a trigger so that new data is also pushed over to the new table. It then renames the tables automagically, drops the old table and you're up and running with the new column and no downtime while you waited for the updates.

HBruijn
  • 72,524
  • 21
  • 127
  • 192
efreedom
  • 361
  • 1
  • 3
  • 9
  • The Percona team has a short write-up on enabling the log_bin_trust_function_creators feature, through RDS parameter groups (as SET GLOBAL log_bin_trust_function_creators = 1 doesn't work on RDS), required by the pt-online-schema-change tool. More details: https://www.percona.com/blog/2016/07/01/pt-online-schema-change-amazon-rds/ – user1652110 Oct 25 '16 at 03:34
  • its worked for me – Adiii Dec 14 '18 at 14:48
9

I periodically need to make changes to tables in mysql 5.1, mostly adding columns.

Don't. No really. Just don't. It should be a very rare occasion when this is ever necessary.

Assuming your data really is normalized to start with, the right way to solve the problem is to add a new table with a 1:1 relationship to the base table (non-obligatory on the new table).

Having to add columns regularly is usually an indicator of a database which is not normalized - if your schema is not normalized then that's the problem you need to fix.

Finally, if your schema really, really is normalized and you really, really must keep adding columns then:

  1. Ensure you've got a timestamp column on the database or that it is generating replication logs
  2. Create a copy (B) of the table (A)
  3. add the new columns to B (this will still block with myisam)
  4. disable transactions
  5. rename the original table (A) as something else (backup)
  6. rename the new table (B) with the name of the original table (A)
  7. replay the transactions from the start of the operation from the replication log or from the backup table
  8. enable transactions.
jammypeach
  • 151
  • 2
  • 11
symcbean
  • 19,931
  • 1
  • 29
  • 49
  • 2
    Thank you for your step by step approach. Is it really uncommon to modify tables? I understand that I can instead add another table with the new column (in the case of needing to add a column) and have it reference the original large table in a 1:1 relationship. But it doesn't seem right to have 15 very large 1:1 tables when they should all be in 1 table... The querying performance of course then suffers as well, not to mention the indexing issues. I'm not an expert, but my database is fairly well normalized and it seems natural that I need to periodically modify.. – apptree Aug 27 '10 at 02:21
  • 2
    "Is it really uncommon to modify tables?" - Yes. – symcbean Aug 27 '10 at 09:37
  • So if I have a `products` table and want to add a `description` column, this should be done in another table?? – Lasse Bunk Aug 03 '13 at 14:17
  • 1
    No, but one can argue that if that happens REGULARLY - not as part of a major software upgrade - then someone needs to be fired for not realizing that all the tables should be there in the first place. The problem / trick here is the "regularly", not "Once every couple of months". – TomTom Jun 23 '14 at 16:19
  • 29
    As a dev, especially one that works in start-ups and young companies, I could not agree less with symcbean and @TomTom. Things change, products change, business goals change and the database structure needs to change with them. Providing good DBA service means saying "yes" to those changes, then figuring out how to implement them efficiently. Heavily-normalized databases are a concept that died a long time ago. They result in bad performance and slow dev cycles. – pents90 Apr 15 '15 at 21:54
  • 4
    Uncommon to change tables??? Maybe in large companies, but in agile team that happens quite often, requirements change... – tibo Aug 12 '15 at 00:42
  • Adding a 1:1 relationship to another table means you have One More Index to maintain and Every row fetch turns into Two Row Fetches. If the database is already normalized, it means it very likely already has multiple row fetches in place, and EVERY row fetch COSTS TIME. Maybe on a pure mathematical basis you can do that and get away with it. On a real live system constrained by the Laws of Physics, the idea ... lacks merit. – UncaAlby Feb 29 '16 at 17:38
  • 3
    some 8 years later =) I just felt I wanted to say im with pents90 and tibo. What symcbean said 8 years ago is just plain wrong, and probably was then too. – Ted Aug 04 '18 at 13:44
  • 1
    Because Google still sends people here ... I agree with those who pointed out that @symcbean's advice is pretty terrible. In the Actual World of Hard Data Problems, new attributes are surfaced regularly, and can't be anticipated at design time because they don't exist then. On a per-event basis, you have to decide if each new property is an entity that needs its own table or only a property of the current table. This "answer" sounds like someone trying to justify a bad aspect of MySQL by saying it's unnecessary, just like when MySQL used to stupidly claim that transactions didn't matter. – chris Nov 28 '18 at 23:01
4

symcbean provides some solid recommendations.

To answer your question, the easiest and best way to mitigate impact is by having multiple databases replicating. Dual master with an appropriate failover procedure stopping replication on the active, which allows an alteration on the inactive without impacting active.

You could potentially do this on a single live database and minimize impact by using a procedure similar to the one I detailed in this answer. Admittedly, this is similar to what symcbean described but includes technical details. You could use an auto_increment field as well and not just timestamp.

Ultimately, if your data set is growing so large, you need to also consider archival between OLTP and OLAP databases. Your transaction dataset should not need to be so large, if you design appropriately.

Warner
  • 23,440
  • 2
  • 57
  • 69
2

From the manual: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL incorporates the alteration into the copy, then deletes the original table and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table are stalled until the new table is ready, and then are automatically redirected to the new table without any failed updates.

So, reading will work fine. Writes will be stalled, but executed afterwards. If you want to prevent this, you'll have to modify your software.

  • So I've done this and disabled the parts of my site that write to the table that I'm modifying right now. So far I have received several "Lock wait timeout exceeded; try restarting transaction" exceptions, that's not too bad. However, they were on PURELY read operations... – apptree Aug 27 '10 at 02:23
0

I am in similar situation where I have to alter 1 of my transaction table which is almost 65GB. I hear 2 solutions

  1. Use straight forward ALTER and let it Run (X numbers of hours or day)
  2. Ensure you've got a timestamp column on the database or that it is generating replication logs
    • Create a copy (B) of the table (A)
    • add the new columns to B (this will still block with myisam)
    • disable transactions
    • rename the original table (A) as something else (backup)
    • rename the new table (B) with the name of the original table (A)
Marko
  • 227
  • 4
  • 7
  • 15