2

I need help in setting up autoscaling based on network throughput.

Our front instances are placed inside an autoscaling group. We want the number of instances to increase as a function of network throughput in the whole autoscaling group. i.e. Once a threshold is reached, add another instance.

Right now, our setup is based on this tutorial:

  • A Cloudwatch metric composed of a new metric called Network Total that equals the sum of NetworkIn (AutoScalingGroup), and NetworkOut (AutoScalingGroup).
  • This metric is supposed to be in Gbit/s, but the y axis in the metric plot shows 'No Unit'.
  • Our thresholds - taken from this analysis - are set on Bytes/s, given that CloudWatch metrics measure in Bytes. So, for a t3.small, 0.13 Gbit/s is 16.250.000 Bytes/s (Google Calculator)
  • The "Instance increase" scheduling policy activates once NetworkTotal has reached 80% of its total throughput. In the t3.small case, it's 80% of 0.13 Gbit/s (0.104 Gbit/s) during 1 minute.

I suspect these calculations are wrong given that our current traffic is way higher than the threshold. The issue might be either on the conversion from Gbit/s to Bytes/s, or in the way we set up the alarm.

Of course any other approach is welcome :)

Thanks in advance.

1 Answers1

0

The alarm should show you a graph with the value of the math expression as well as the threshold that you can use to see if it looks in line.

It may also be that the metric is going over the threshold, but not for long enough to trigger the alarm based on its period length and number of periods.

Scaling T3's on Network usage is a bit tricky though, since they have burstable network performance. When your doing load tests, are you seeing them run out of bandwidth before CPU/Memory? If something else is running out first, you may want to scale on the bottlenecked resource instead

Shahad
  • 326
  • 1
  • 6