AWS RDS MySQL Slowing down over time

Question

I've read many posts about this topic but none of them talk about AWS RDS MySQL Database. Since three days ago, I'm running a python script in an AWS EC2 instance that write rows in my AWS RDS MySQL database. I have to write 35 millions rows, so I know this will take some time. Periodically, I check the performance of the database, and three days later (today) I realize that the database is slowing down. When it started, the first 100,000 rows were written in only 7 minutes (this is an example of the rows I'm working with)

0000002178-14-000056    AccountsPayableCurrent  us-gaap/2014        20131231    0   USD 266099000.0000

After three days, 5,385,662 rows have been written in the database, but now it takes almost 3 hours to write 100,000 rows. What is happening?

The EC2 instance that I'm running is the t2.small. Here you can check the specs if you need so: EC2 SPECS . The RDS database that I'm running is the db.t2.small. Check the specs here: RDS SPECS

I will attached here some charts about the performance of the database and the EC2 Instance: Db CPU / Db Memory / Db Write IOPS / Db Write Throughput / EC2 Network in (bytes) / EC2 Network out (bytes)

It would be great if you could help me. Thanks a lot.

EDIT 1: How am I inserting rows? As I said before, I've a python script running on an EC2 instance, this script read text files, make some calculations with these values, and then write every "new" row to the database. Here are a small piece of my code. How I read the text files?

for i in path_list:
  notify("Uploading: " + i)
  num_path = "path/" + i + "/file.txt"
  sub_path = "path/" + i + "/file.txt"

  try:
    sub_dict = {}
    with open(sub_path) as sub_file:
      for line in sub_file:
        line = line.strip().split("\t")
        sub_dict[line[0]] = line[1] # Save cik for every accession number
        sub_dict[line[1] + "-report"] = line[25] # Save report type for every CIK
        sub_dict[line[1] + "-frecuency"] = line[28] # Save frecuency for every CIK

    with open(num_path) as num_file:
      for line in num_file:
        num_row = line.strip().split("\t")

        # Reminder: sometimes in the very old reports, cik and accession number does not match. For this reason I have to write 
        # the following statement. To save the real cik.

        try: 
          cik = sub_dict[num_row[0]]
        except:
          cik = num_row[0][0:10]

        try: # If there is no value, pass
          value = num_row[7]
          values_dict = {
                  'cik': cik, 
                  'accession': num_row[0][10::].replace("-", ""),  
                  'tag': num_row[1], 
                  'value': value, 
                  'valueid': num_row[6], 
                  'date': num_row[4]
                  }

          sql = ("INSERT INTO table name (id, tag, value_num, value_id, endtime, cik, report, period) "
              "VALUES ('{}', '{}', '{}', '{}', '{}', '{}', '{}', '{}', '{}', '{}')".format(
                  values_dict['cik'] + values_dict['accession'] + values_dict['date'] + values_dict['value'].split(".")[0] + "-" + values_dict['tag'], 
                  values_dict['tag'], 
                  float(values_dict['value']), 
                  values_dict['valueid'], 
                  values_dict['date'], 
                  int(values_dict['cik']), 
                  sub_dict[values_dict['cik'] + "-report"], 
                  sub_dict[values_dict['cik'] + "-frecuency"]
                  ))

          cursor.execute(sql)
          connection.commit()

I know there is no except: to cath the trystatements, but this is only a piece of the script. I think the important part is how I insert every row. In case that I don't need to make calculations with the values, I'll use Load Data Infile to write the text files to the database. I just realize know that maybe is not a good idea to commit every time I insert a row. I will try to commit after 10,000 rows or so.

Which flavor of RDS are you using? – Rick James Aug 24 '20 at 00:11 — Rick James, Aug 24 '20 at 00:11
Why more bytes out than in? – Rick James Aug 24 '20 at 00:14 — Rick James, Aug 24 '20 at 00:14
@RickJames sorry but I don’t know what the flavor is... – Dennis CM Aug 27 '20 at 08:03 — Dennis CM, Aug 27 '20 at 08:03

MLu · Accepted Answer · 2020-08-24T01:46:35.377

11

T2 and T3 instances (incl db.t2 db.t3 instances) use CPU Credit system. When the instance is idle it accumulates CPU Credits that it can then use to run faster for short periods of time - Burst performance. Once you deplete the credits it slows down to a Baseline performance.

One option is to enable T2/T3 Unlimited setting in your RDS config which will let the instance run at full speed for as long as needed, but you will pay for the extra credits needed.

The other option is to change the instance type to db.m5 or some other non-T2/T3 type that supports consistent performance.

Here is a more in-depth explanation of the CPU credits and how they are accrued and spent: On clarifying t2 and t3 working conditions?

Hope that helps :)

edited Aug 24 '20 at 01:46

answered Aug 23 '20 at 23:03

MLu

23,798
5
54
81

2

I did the numbers on this - its much cheaper to upgrade to a proper CPU instance if you routinely go over the CPU allocation. Unlimited CPU on a t2/t3 only makes sense if you don't use it, but want some insurance to prevent running out if something else happens. (slashdotting, DOS, etc) – Criggie Aug 24 '20 at 01:43
I've been checking my options and I'll test this option first, as it is the easiest one to test. I'll let you know. Thank you. – Dennis CM Aug 24 '20 at 08:19
1

To test this you would look at the CPU Credit metric, if it was EC2. I guess RDS probably exposes that as well. A quick Google didn't find anything, but try looking in Cloudwatch. – Tim Aug 24 '20 at 09:29
1

I've changed my db instance type to db.r5.xlarge and also change the EC2 instance to c5n.large . It is more expensive but now I'm writing 100,000 rows every 40 secs. So ultimately, I will save money because I'll have my instances working less time. – Dennis CM Aug 26 '20 at 09:02
Congratulations on getting to c5n.large ! Additional information request. Any SSD or NVME devices on MySQL Host server? Post on pastebin.com and share the links. From your SSH login root, Text results of: B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) complete MySQLTuner report for server workload tuning analysis to provide suggestions. – Wilson Hauck Aug 26 '20 at 14:58

score 10 · Answer 2 · answered Aug 24 '20 at 00:09

Single-row INSERTs are 10 times as slow as 100-row INSERTs or LOAD DATA.
UUIDs are slow, especially when the table gets large.
UNIQUE indexes need to be checked before finishing an iNSERT.
Non-unique INDEXes can be done in the background, but they still take some load.

Please provide SHOW CREATE TABLE and the method used for INSERTing. There may be more tips.

tater · Answer 3 · 2020-08-24T23:18:53.040

7

Each time you commit a transaction index(es) need to be updated. The complexity of updating an index is related to the number of rows in the table, so as the number of rows increases, the index update becomes progressively slower.

Assuming you are using InnoDB tables, you can do the following:

SET FOREIGN_KEY_CHECKS = 0;
SET UNIQUE_CHECKS = 0;
SET AUTOCOMMIT = 0;
ALTER TABLE table_name DISABLE KEYS;

Then do the inserts, but batch them so that one statement inserts (e.g.) several dozen rows. Like INSERT INTO table_name VALUES ((<row1 data>), (<row2 data>), ...). When the inserts have finished,

ALTER TABLE table_name ENABLE KEYS;
SET UNIQUE_CHECKS = 1;
SET FOREIGN_KEY_CHECKS = 1;
COMMIT;

You can adjust this for your own situation, for example if the number of rows is huge then maybe you want to insert half a million then commit. This assumes your database is not 'live' (i.e. users actively reading/writing to it) while you are doing the inserts, because you're disabling checks that you might otherwise rely on when they enter data.

edited Aug 24 '20 at 23:18

answered Aug 23 '20 at 16:07

tater

1,395
2
9
12

I'm new with all of these settings, does it affects the unique key index? I want to avoid write duplicates. Also, what do you mean with 'live'? if my db is still functioning? Yes, it is – Dennis CM Aug 23 '20 at 16:12
If you disable the keys, then the indexes (including checks for unique) are created at the end of the import. It is roughly like creating the table with no indexes and then adding them at the end. If you do not want to do that, the checks and index update will be performed when each transaction is committed (which will slow down over time, see answer). By "live" I mean are users reading and writing to the table while you are doing the inserts? If yes, then it can be unsafe to disable unique checks. – tater Aug 23 '20 at 16:35
InnoDB is pretty good at spreading the index updates, at least for non-unique indexes. See "change buffering". – Rick James Aug 24 '20 at 15:35

AWS RDS MySQL Slowing down over time

3 Answers3