2

I created an alarm to stop an instance and email me if it was idle for too long (avg. CPU Utilization < 2% for 3 hours). However in my testing I noticed that the instance was stopped after 1 hour. Attached is the report from the email:

Alarm Details:

Name: Stop

Description: Created from EC2 Console

State Change: INSUFFICIENT_DATA -> ALARM

Reason for State Change: Threshold Crossed: 2 datapoints were less than the threshold (2.0). 

The most recent datapoints: http:// 0.0425, 0.038363636363636364.

Timestamp: Thursday 14 March, 2013 22:20:11 UTC

AWS Account: xxxxxxxxxxxx

Threshold:
The alarm is in the ALARM state when the metric is LessThanThreshold 2.0 for 3600 seconds.

Monitored Metric:
MetricNamespace: AWS/EC2
MetricName: CPUUtilization
Dimensions: InstanceId = i-xxxxxxx
Period: 3600 seconds
Statistic: Average
Unit: not specified

State Change Actions:
OK:
ALARM: arn:aws:sns:us-east-1:xxxxxxxxxxxx:NotifyMe
INSUFFICIENT_DATA:

I'm confused as to why it enters the ALARM state after just 1 hour (3600s) when I set it to 3 hours (10800s). For my test, the instance had been stopped all day. Once I created the alarm I started it and didn't do anything with the instance. Does it take into account all those stopped hours when it calculates the avg CPU utilization over 3 hours?

I would like to have the alarm let the instance stay alive for the threshold of 3 hours before it stops the instance. Is there a better way to do this?

2 Answers2

4

In your email it clearly states that your alarm is set to trigger after 3600 seconds.

Threshold: The alarm is in the ALARM state when the metric is LessThanThreshold 2.0 for 3600 seconds.

There should be an option to set "EvaluationPeriods". What this does is it tells the alarm how many times to evaluate the specific metric you wish to check. So in your case you would set this to 3 and the alarm would check once every hour to see if the metric is LessThanThreshold 2.0. The alarm will trigger if for 3 consecutive hours the average of the 3 points taken is LessThanThreshold 2.0.

Another thing to note is that your alarm state went from INSUFFICIENT_DATA -> ALARM. I have noticed this activity with some alarms I am working on.

In my case:

  • I have an alarm that stops an instance when LessThanThreshold 5.0 for CPUUtilization for 1 hour with 6 evaluation periods, one every 10 minutes.
  • When an alarm gets new data after there being INSUFFICIENT_DATA it seems to trigger my alarm to the ALARM state as I think it treats INSUFFICIENT_DATA as 0.0 (don't quote me on this, this is just what I am assuming based off some tests I am running).
  • Even though the first point being taken could be 25.6% the last 5 points were INSUFFICIENT_DATA (possibly 0.0?) so the average is around 4.2ish which is LessThanThreshold 5.0.
  • Then my alarm is triggered even though its technically only been 10 minutes with "real" data.

To mitigate this I have set up a script so that whenever an instance is started the alarm is created with it and when ever an alarm is triggered it deletes itself after stopping the instance it is assigned to.

1

It seems that you configured the alarm in a wrong way, as per above mail you configured it to fire if LessThanThreshold 2.0 for 3600 seconds.

  • To Resolve this do as below:-

  • In CloudWatch Management console, just select the alarm, you'll find below Threshold explained as shown in image below Threshold: CPUUtilization >= 70 for 5 minutesExample

  • Right click on the Alarm and modify it according to your need

  • You can use as-describe-alarm api to make sure of your alarm information also as below link API Description
Bassam Gamal
  • 250
  • 4
  • 12