AWS Fargate service: scale to zero?

Question

I've recently migrated a small web application to AWS using Fargate and Aurora Serverless. The application doesn't get much traffic so my goal is to save cost while no one is using it. Aurora Serverless seems to do this for me on the DB side automatically.

However, I'm struggling to find any resources on how to scale a Fargate service to zero.

There is an ALB in front of it and I know ALB request count can be used in scaling... so ideally when there is an average of 0 requests over a period of say 10 minutes, the service would scale down to zero tasks. Then when a request comes in, it would trigger a scale-up of one task for the service.

I suspect you might be better off with Lambda if you really need to do this. Scaling to zero means you have to boot your container / OS / application, which means any request could time out before it's serviced. — Tim, Jan 30 '19 at 07:01

MLu · Accepted Answer · 2019-01-31T00:40:29.593

11

I’m not sure how exactly it would work. When there are no healthy ALB targets the ALB returns 503 error, hence your visitors would see an error page instead of your website. That may trigger a Fargate container start but that often takes tens of seconds, sometimes even over a minute. By the time your container is up your visitor is probably gone.

If you want a truly serverless website with zero idle costs you’ll have to implement it using API.

Put your frontend files (HTML, CSS, JS) in S3
Load your dynamic content through API
Implement the dynamic functionality in Lambda functions
Use API gateway to call the Lambdas
The DB can be Aurora Serverless or DynamoDB On-Demand

This architecture costs nothing when idle and provides instant response to your visitors.

Update: if you still want to scale down the Fargate Service to 0 Tasks you can certainly do it through setting the Service's DesiredCount to 0. That can be done e.g. through aws-cli:

~ $ aws ecs update-service ... --service xyz --desired-count 0

If you want to do this in Dev I suggest you run this UpdateService either manually, or from a cron-job, or from a scheduled Lambda function. Either way you can set the task to 0 at night and back to 1 the next working day. That'll be easier than relying on AutoScaling which may not be that reliable for very low traffic.

Hope that helps :)

edited Jan 31 '19 at 00:40

answered Jan 30 '19 at 07:34

MLu

23,798
5
54
81

Maybe not quite "instant" responses... but it's far less likely to time out than having to spin up a container / app/ – Tim Jan 30 '19 at 07:43
@Tim since the static content is ready on S3 the visitor will only wait for the dynamic bits from API. That’s a much better customer experience than 503 error ;) – MLu Jan 30 '19 at 07:47
I was just referring to the word instant, the way it's worded can mean zero latency, which is obviously not possible. The serverless pattern you described is a good one, I hinted towards it with "Lambda" in my somewhat lazy comment above :) – Tim Jan 30 '19 at 08:04
I agree - this is an ideal setup. I am going to look more into moving this to Lambda; definitely seems like the best solution available for a low-use application where we're trying to save on cost. --- However, for posterity, it'd still be nice to know if it is possible to auto-scale to zero in Fargate. Say for example you have a staging environment that is rarely used, and when it is used, it's just developers. It'd be great to have it auto-scale to zero when no one is using it, and the developers can deal with 503s for a bit while it starts up. – computmaxer Jan 30 '19 at 20:24
1

@computmaxer Added info about ECS `update-service --desired-count 0` API call. That answers your question. – MLu Jan 31 '19 at 00:41
Thanks. So it sounds like it's not really possible to do automatically out-of-the box with Fargate. If this changes in the future via new features/functionality I will update this question with a new answer. – computmaxer Feb 05 '19 at 04:20

score 6 · Answer 2 · answered May 06 '19 at 16:11

6

If re-writing your app to fit the above response it's not an option or costly, you could look into GCP CloudRun

CloudRun it's serverless containers developed by GCP. You can pack your website in a container and then CloudRun only bills you per CPU usage during requests and boot-up. It even has a really good free tier that will make running your app at minimum costs.

So you could combine Amazon Aurora with GCP CloudRun for minimum costs and no need to rewrite your app.

answered May 06 '19 at 16:11

Jimmy

211
2
6

But isn't CloudRun functionally equivalent to AWS Lambda for the purposes of this question? Being open-source addresses the issues of flexibility and vendor lock-in but it still requires that the app have its compute requirements be compatible with FaaS. – Tom Jul 08 '19 at 22:33
3

@Tom No, Cloud Functions is GCP's equivalent of AWS Lambda. CloudRun is Serverless Containers. If you have a wordpress inside a docker container, you can upload that straight into CloudRun (in theory) without any modifications. – Jimmy Jul 10 '19 at 18:56
1

But Cloud Run is designed for transactions and has time limits to enforce that (10 or 15m I believe). That's why I say it is equivelent to lambda for the purposes of this question - you can't just package up an app into a container and put it on cloudrun unless you make the app conform to that model. – Tom Jul 11 '19 at 15:07
4

@Tom I agree. However it's far easier (In my opinion) to adhere to Cloud Run restrictions compared to the refactor of your code to adhere to Lambda/Cloud Functions. Most simple web apps (dockerized) should be plug & play in CR, while rewriting Joomla or Wordpress into Lambda it's just imposible. – Jimmy Jul 11 '19 at 16:17
1

Amazon also now supports dockerised lambda applications. https://docs.aws.amazon.com/lambda/latest/dg/runtimes-custom.html – Yahya Uddin Dec 22 '19 at 08:25
Yes Lambda / Docker is a thing now but for those in the machine learning space, you are still out of luck if you need a GPU (huge boost for both training and inference). – Julian H Dec 20 '20 at 11:51
Dockerzed Lambda is not the same - it requires proprietary API. Cloud Run supports any dockerized web server (no websockets). You can literally use any image with web app from Docker Hub and it will work (without persistent volumes). – Bobík Feb 06 '22 at 16:26
BTW Cloud Run timeout can be increased to up to 60 min. If the aim of a serverless workload is to scale to zero after processing a request, 60 min is quite a decent limit. Another benefit is that the Cloud Run workload can serve multiple concurrent requests (80 by default, can be increased to hundreds). – Alexey Zimarev Jun 01 '22 at 12:21

AWS Fargate service: scale to zero?

2 Answers2

Linked