0

I have stack in cloudformation, containing ECS cluster with autoscaling EC2 instances, running in private subnets. ECS is covered by ALB (in public subnets). Container instances get images from AWS ECR. I created these VPC endpoints:

  • s3 vpc gateway endpoint
  • api.ecr endpoint
  • dkr.ecr endpoint
  • ecs endpoint
  • ecs-telemetry endpoint
  • ecs-agent endpoint

There are routing tables for private subnets containing route to s3 vpc gateway endpoint, and tables for public subnets containing route to internet gateway (0.0.0.0/0), see the scheme:

enter image description here

Everything works without vpc endpoints and with NAT gateways (located in public subnets), but NOT without them.

Error I am getting: .. was unable to place a task because no container instance met all of its requirements. No container instances were found in your cluster

But EC2 instances get actually created, and have correct AMI (and I checked they can reach all vpc endpoints by Reach analyzer). VPC ecs+ecr endpoints are configured by these official sources (ECS documentation).

Have you experienced anything similar or know how to debug this?

wtdmn
  • 33
  • 2
  • 1
    As I'm sure you've worked out, there's some additional communications that is going out via the NAT gateway. I wonder if it needs one of the EC2 endpoints - there's a couple of types I think. It's probably worth adding them to see what happens. – Tim Sep 04 '22 at 01:08
  • @Tim exactly, even from AWS support they recommended ec2 and ec2messages vpc endpoints, didn't work though. WIll have to look into this further. – wtdmn Sep 04 '22 at 07:25
  • You could try to use VPC flow logs to see what traffic is going where through the NAT, but those logs aren't super easy to use. You'd then have to look up the AWS IP ranges, and there would be noise from auto updates. Logically though ECR needs to be able to communicate with EC2 somehow. I would try adding any endpoint that could be vaguely related, including systems manager, just add them all and if it works take them away one by one until it fails. – Tim Sep 04 '22 at 08:22
  • Well I narrowed the problem to 'cfn-signal not received', that is, autoscaling group got created but never received success signal from private ec2 instances. I guess I will have to go with enabling all the endpoints and rule out whats not needed. – wtdmn Sep 04 '22 at 15:56
  • That's weird ( https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-signal.html ). Looks like CloudFormation didn't receive a response from EC2. Is CFN creating the EC2 instance or are they being created dynamically by ECR? I haven't used ECR in EC2 mode, I've used Fargate mode. – Tim Sep 04 '22 at 18:30
  • 1
    EC2 instances are created as launch configuration (getting images from ECR) and they are in autoscaling group, not easy trivial setup. Everything is in cloudformation, cloudformation vpc endpoint might be necessary ([link](https://aws.amazon.com/blogs/mt/signaling-aws-cloudformation-waitconditions-using-aws-privatelink/)). I will inspect further. – wtdmn Sep 04 '22 at 19:29

0 Answers0