1

I'm operating an Amazon AWS web application that already has a lot of timezone variation in CPU utilization.

I'm also concerned about going viral too because it has a tendency to happen with us and if I'm asleep at the time our service could become slow or unreachable for hours.

Currently we use a single EC2 instance. Yesterday we even had some downtime when the server was being replaced automatically and didn't boot up correctly - I still don't know the full reasons and probably never will.

It won't let me include an image because I need 10 reputation points, so if someone could edit this post and make this an embedded image I would appreciate it:

enter image description here

I'm now considering replacing our frontend with a load balancer and autoscaling. This would allow us to save money and improve performance, reliability, as well as mitigating the effects of bugged EC2 instances.

I wonder though what the thresholds are for EC2 CPU performance when running PHP?

We operate an application where we want to prioritize performance over cost, while still not throwing money away.

What CPU percentage thresholds should I set for:

  • When to add a new instance
  • When to remove an excess instance

If anyone has any graphs of performance vs CPU load that would be amazing to see.

Or should I use a different metric than CPU altogether?

Pierre.Vriens
  • 1,159
  • 34
  • 15
  • 19

2 Answers2

1

There isn't a one-size-fits all answer to this. You'll want to load test a single node and see at what cpu usage the response time changes significantly. This might be 90% or 10%, depending on your application and how it handles concurrency. JMeter is a handy tool for this sort of test.

You'll then set your upscale level somewhere comfortably lower than that level. Keep in mind that scaleups take a certain amount of time so you'll want to leave yourself some runway there.

Downscaling is a little easier. Look at your average node's usage and set the target a little below that, so if you've overscaled or traffic has dropped off it'll scale down. It is usually better for performance to scale up in larger increments than scale down.

You can set CloudWatch alarms to watch for your cluster size to reach a given value, the 'max' value being a prime candidate. That will wake you up if it has scaled to its limit and may need some intervention.

Jason Martin
  • 4,865
  • 15
  • 24
0

I have a few thoughts for you. Some directly answer your question, some are other things to consider.

First up, PHP is typically CPU intensive. Scaling based on CPU use is probably sensible. You'll have to work out the thresholds based on your experience, load testing, or trial and error. You should probably be conservative to start with, watch it for a while, then adjust to get a good balance of utilization vs cost.

There are general guides on scaling. This guide suggests scale up at 80% and down at 20%, whereas this Amazon guide suggests scaling up at 80% and down at 40%.

Caching anonymous content can reduce your CPU usage by a huge amount, depending on the application. If 99% of your users are anonymous you can serve all of them with a page generated once. To reduce load and cost further you can use a CDN such as CloudFront or CloudFlare to serve this static content. If you use a CDN you need to properly set your caching headers.

Choose your EC2 instance types carefully. T2 instances have variable CPU, once you run out of credits that instance will immediately slow down. The load balancer algorithm "least connections" should cater for this, but you might consider general purpose M instances if T2 give you problems.

You can have more than one autoscaling group associated with a load balancer. You could for example add spot instances at a low CPU threshold, earlier than you would scale up on-demand instances, but then have another group add on-demand instances. This is covered in this question.

I also wonder if you could have a T2 instance in your ELB, but then scale up with M instances if load increases. I think you probably could, using multiple autoscaling groups similar to the technique above. It might not be worth the bother though.

Tim
  • 30,383
  • 6
  • 47
  • 77