2

I'm trying to understand if I have spec'd my database appropriately. Below is a chart showing the WriteIOPS, CPUCreditBalance and BurstBalance for a t3.xlarge instance of SQL Server. It looks like I'm consuming my BurstBalance in another 15 hours or so, given a fairly constant WriteIOPS rate. However, the CPUCreditBalance is steadily increasing.

AWS CloudWatch metrics

What will happen in +-15 hours - will the database be throttled or not? I've tried to understand the metrics defined here and described here, but I'm not sure exactly what the difference between the two balances is - can someone clarify what the two balance metrics mean?

Sean
  • 123
  • 6

2 Answers2

4

CPUCreditBalance and BurstBalance are two unrelated metrics.

On T type instances, you have a CPUCreditBalance. If you have sustained CPU usage you will deplete your credit balance and the machine will be throttled. T type instances are only good for intermittent workloads. Any process (even an errant process) that continues to consume even small amounts of CPU, can cripple the system if it is not sized properly. The table here shows that a t3.xlarge can run at a baseline of 40% per vCPU to neither gain nor lose credits. Anything that keeps the server running above that rate will consume credits until the system runs out of credits and is throttled to the baseline speed. Essentially your system will be throttle to 40% CPU usage.

On the other hand, BurstBalance is a function of the EBS storage volume backing an EC2 or RDS instance. When you provision a standard gp2 storage volume, it provides a baseline of performance. However, you can earn credits to burst above that performance. The larger the volume, the larger the baseline performance. If you have a process consuming disk (read or write), it will run much faster than the baseline performance until the balance is exhausted. It will then be throttled to baseline performance. More info on that here.

In your graph, you are missing key values and those are CPUUtilization and ReadIOPS. What you see is that when you have sustained read or write IOPS to disk, your burst balance decreases. When it runs out you will be limited to baseline performance of the disk. Additionally, you see if you have sustained CPU usage your credit balance will decrease. When it runs out your CPU will be throttled to baseline performance.

Depending on your workload you may have to adjust the size of your instance, or volume to account for your needs. Or you may have to change to a non-burstable instance type for reliable and consistent CPU performance. Or, you might have to change to a provisioned iops storage volume for reliable and consistent disk performance.

Appleoddity
  • 3,290
  • 2
  • 10
  • 27
2

If your load is constant 24/7 you will run out of BurstBalance (EBS disk). There's a good blog article about it here. However, if your load reduces say outside business hours the burst balance will likely recover.

If you have a GP2 / GP3 disk I suggest increasing the disk size as your burst balance will increase more quickly. If it's IO1 / IO2 increase the IOPS allocated.

Tim
  • 30,383
  • 6
  • 47
  • 77