We have 2 auto scaling groups (one for on-demand and one for spot instances) which are both set to a static number of instances (min, max, and desired are all the same - 5 in our case). The instances in the on-demand group stay running, but the ones in the spot group are frequently terminated due to a system health check. The message shown for a terminated instance in the Scaling History tab in the EC2 Management Console is e.g.:
"At 2014-05-07T18:06:45Z an instance was taken out of service in response to a system health-check."
I don't know why our spot instances are failing a health check. Our bid price is high, and I don't think the instances should have been terminated due to spot price (based on spot pricing history). I've adjusted the AZs that the instances are launched in also, and I don't see a difference. I don't see any suspicious messages when I check the syslog of a recently terminated instance. We're using a private/custom AMI for both groups, but I see the same behavior when I switch to a more generic AMI (the "Ubuntu 12.04 LTS Precise EBS boot" image listed on alestic.com - ami-5db4a934). Again, our on-demand instances stay running and don't fail health checks. We're using the "EC2" health check type.
Here is the command we're using to create our launch configuration via the AWS CLI:
aws autoscaling create-launch-configuration \
--launch-configuration-name [name] \
--image-id ami-5db4a934 \
--key-name [our key] \
--security-groups [our SGs] \
--instance-type m3.xlarge \
--block-device-mappings '[ { "DeviceName": "/dev/sda1", "Ebs": { "VolumeSize": 8 } } ]' \
--spot-price "1.00"
Does anyone know what this might be or how we can get more visibility into why the spot instances are failing health checks?