Our company (small business) had recently taken on new business in an APAC region which requires us to support 4 bit characters across our MySQL 5.7 DB.
This requirement means that we need to convert our database from one charset to another.
Database tables range from a few rows to ~100GB in size. Some have many foreign keys, indexes. There is the odd one or two stored procedures but those can be counted on one hand.
We can schedule downtime for this.
We run in AWS RDS and to get the best conversion time we decided to take a snapshot, move to the largest capacity instance with the highest provisioned IO we could use. That was like an r5.24xlarge or something with 50,000 IOPS.
Unfortunately, when we test this, it takes over 8-12 hours to convert every table using ALTER TABLE SET CHARSET...
Whilst we can schedule downtime, we can't be down for that amount of time. 4 hours is probably pushing the limit of what the business deems acceptable.
We've started to trim old data from a few tables but it's nowhere near enough.
The main bottleneck here is that the RDS Disk Throughput peaks at 350MB/s no matter how many table conversions we run at once, the size of the DB, storage allocation or provisioned IOPS. Indeed, IOPS never reaches 50K and CPU/Memory is minimal. Disk queue reaches around 10 which I believe is because of the throughput constraint.
We also tried Aurora which was actually slower for our case.
Does anyone have any idea how we can either increase throughput further or reduce the time it takes to make this conversion?