1

An auto-scaling group launches EC2 instances and it appears that instances that run roughly >24 hours begin to degrade in performance. The longest one was running for 3 days until I manually terminated it. That seems unusually long in an auto-scaling group where instances are terminated every so often.

Specifically the CPU Utilization User% goes up to 30-40% and stays that high, while other instances in the auto-scaling group are only at around 10-15%. This uses up CPU credits and degrades general EB environment metrics such as avg. response time and 5xx status code responses.

enter image description here

1) Why would an instance start to gradually impair after 24 hours? The instances are running Parse Server (nodeJS). How can I figure out what's wrong with the instance? I plan to SSH into the instance when it occurs again and take a look at the processes with top.

2) How can I auto-terminate instances that run longer than 24 hours? I tried to set up a Cloud Watch alarm but EC2 > Per Instance does not provide an "up-time" metric. I could set an alarm on the CPU Utilization, but I am unsure about the characteristics of this metric for faulty instances, so terminating after 24h seems to be a safer bet.

Update

ad 1) The issue could be this: https://github.com/parse-community/parse-server/issues/6061

Manuel
  • 205
  • 1
  • 10
  • I think the increase in CPU is down to whatever you have the instances doing. I have EC2 serves that have run for months with no problems, and are only rebooted to do updates. Instead of terminating the instances I'd suggest you try to work out what they're doing - it could be that they've been compromised and someone has taken them over, or just a rogue process. – Tim Oct 11 '19 at 01:09
  • @Tim, I will take a look at the processes when this happens again. – Manuel Oct 11 '19 at 07:04
  • In general, what you're asking for can be done with Alarm Actions: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html – Serg Chernata Oct 14 '19 at 22:12
  • @SergChernata Yes, but how do I create an alarm for an instance that runs 24h? I can only think of a lambda function that pulls EC2 meta info periodically. But it can then just detach and terminate the instance directly, without alarm. – Manuel Oct 14 '19 at 22:34

2 Answers2

2

Release November 2019, Autoscaling Group Launch Templates have an optional parameter to auto-terminate instances after a given amount of time. You can read about it here.

enter image description here

The maximum length of time that an instance can be in service. If any instances are approaching this limit, Amazon EC2 Auto Scaling gradually replaces them.

Here's what the blog post says

Amazon EC2 Auto Scaling now lets you safely and securely recycle instances in an Auto Scaling group (ASG) at a regular cadence. The Maximum Instance Lifetime parameter helps you ensure that instances are recycled before reaching the specified lifetime, giving you an automated way to adhere to your security, compliance, and performance requirements. You can either create a new ASG or update an existing one to include the Maximum Instance Lifetime value of your choice between seven and 365 days.

Tim
  • 30,383
  • 6
  • 47
  • 77
Manuel
  • 205
  • 1
  • 10
1

1) Why would an instance start to gradually impair after 24 hours?

There is no issue with EC2 instances that run >24hrs. Your application is probably buggy, and slowing down over time. Perhaps there is a memory leak leading to increased swapping?

2) How can I auto-terminate instances that run longer than 24 hours?

There are many ways. The simplest is probably to bundle a shell script with your application deployment that kills the instance after 24 hours. You could do that with a command like bash -c 'bash -c "sleep 3 && echo hi" &'. You can run that on application deployment by adding it to the command section of an .ebextension file in your application version.

Alex J
  • 2,804
  • 2
  • 21
  • 24