0

Our company (small business) had recently taken on new business in an APAC region which requires us to support 4 bit characters across our MySQL 5.7 DB.

This requirement means that we need to convert our database from one charset to another.

Database tables range from a few rows to ~100GB in size. Some have many foreign keys, indexes. There is the odd one or two stored procedures but those can be counted on one hand.

We can schedule downtime for this.

We run in AWS RDS and to get the best conversion time we decided to take a snapshot, move to the largest capacity instance with the highest provisioned IO we could use. That was like an r5.24xlarge or something with 50,000 IOPS.

Unfortunately, when we test this, it takes over 8-12 hours to convert every table using ALTER TABLE SET CHARSET...

Whilst we can schedule downtime, we can't be down for that amount of time. 4 hours is probably pushing the limit of what the business deems acceptable.

We've started to trim old data from a few tables but it's nowhere near enough.

The main bottleneck here is that the RDS Disk Throughput peaks at 350MB/s no matter how many table conversions we run at once, the size of the DB, storage allocation or provisioned IOPS. Indeed, IOPS never reaches 50K and CPU/Memory is minimal. Disk queue reaches around 10 which I believe is because of the throughput constraint.

We also tried Aurora which was actually slower for our case.

Does anyone have any idea how we can either increase throughput further or reduce the time it takes to make this conversion?

Alex Bailey
  • 101
  • 1
  • Do you have possibility for downtime during the night? Have you consider to convert tables one by one (or in package but within the downtime period)? And continue conversion week or more... – Romeo Ninov Jul 27 '22 at 08:09
  • Another idea is to create second database with appropriate locales and start sync between them. Here is one possible way how to sync: https://dev.mysql.com/doc/workbench/en/wb-database-synchronization.html – Romeo Ninov Jul 27 '22 at 08:31
  • @RomeoNinov we're a 24 hour business. There is no 'good' time for down time. – Alex Bailey Jul 27 '22 at 10:00
  • @RomeoNinov do you think replicating to another DB with a different charset would cause a problem? – Alex Bailey Jul 27 '22 at 10:00
  • Alex, IMHO no, but you can test it with small database filled with typical data – Romeo Ninov Jul 27 '22 at 10:34
  • Even paper tape (back in the '50s) had 5-bit characters. Perhaps you mean 4 BYTE? So, are you doing `ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4`? Have you tried pt-online-schema-change? – Rick James Jul 28 '22 at 04:01

1 Answers1

0

If you are using a new enough version of MySQL and do not need utf8mb4 in any PRIMARY KEY, the following might work. (Please thoroughly test on large tables first)

For each table:

  1. Add VARCHAR/TEXT column(s) that are similar to your existing utf8 columns, but with utf8mb4 and a suitable collation.
  2. Add a trigger to do the following step for any VARCHAR/TEXT columns as new data comes in.
  3. Run an UPDATE on the table to set the new column(s) to the old columns.

Later, after all the tables have been upgraded:

  1. Change any indexes involving the columns
  2. Drop the old column(s).

There are some potentially serious problems with this mechanism:

  • JOIN...ON with different charset may not use indexes.
  • I am not sure that this mechanism will work without some significant times where a table is inaccessible.
  • And probably other things.

pt-online-schema-change -- probably changes one table at a time, leading to the above problems.

STATEMENT-based replication (which may not be available in RDS?) may provide a way to convert a Replica, continue replicating, and (when convenient) fail-over.

Rick James
  • 2,058
  • 5
  • 11