1

We have an instance within a private subnet that has a managed NAT gateway. On that instance, we are able to access the internet:

$ curl https://www.google.com/
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head>...

However, we are not able to access the cloudwatch endpoint, e.g. the following times out: (EDIT: My mistake, not the cloudwatch endpoint, but rather the site storing the cloudwatch monitoring scripts.)

$ curl https://cloudwatch.s3.amazonaws.com

DNS is not the problem:

$ dig cloudwatch.s3.amazonaws.com
cloudwatch.s3.amazonaws.com. 2303 IN    CNAME   s3-1-w.amazonaws.com.
s3-1-w.amazonaws.com.   1   IN  A   54.231.72.59

Any ideas about what might be happening?

JustinHK
  • 131
  • 5
  • First check your security groups and network ACLs allow outgoing on appropriate ports. What happens if you try the same from your public subnet? Spin up a spot instance to try it if you don't have any you can use. – Tim Sep 09 '16 at 21:41
  • We have 1 ACL (allow all traffic) and the security group allows all outbound traffic. There is an instance with the same security group in the public subnet and it works as expected. – JustinHK Sep 09 '16 at 21:49
  • 2
    That URL looks like it's pointing at an S3 bucket called "cloudwatch", but that should work too. Cloudwatch endpoints are on this page - what happens if you try them? http://docs.aws.amazon.com/general/latest/gr/rande.html#cw_region – Tim Sep 09 '16 at 22:00
  • What does the routing table associated with your private subnet look like? – Canuteson Sep 10 '16 at 05:41

3 Answers3

2

I actually Had the same issue and managed to resolve it in the same way that JustinHK did below. I've reached out to AWS to understand why it happened because I couldn't let it go, so this should help with explaining the behavior. Here's the breakdown:

  • The issue is not with the traffic not being able to reach the destination, it's with the traffic not being able to return correctly to the origin.
  • Since the public subnet (where the NAT gateway is sitting) has 2 options to reach the destination - either via the VPCE (VPC Endpoint) or via the IGW (Internet Gateway), it doesn't know which one to pick when the request is doing the trip back. Since it doesn't know which one to pick - it just times out.
  • Routing chooses the path of least resistance, so adding the VPCE in the private subnet made the VPCE route the ideal route. Though it's worth mentioning here that the request isn't going through the public subnet at all, as it now has a VPCE in the private subnet.

Depending on the setup you're running and whether you actually need the IGW for anything else besides reaching out to S3, one might either drop the IGW from the public subnet or drop the NAT gateway linkage between the private and public subnet. Both of those options should clean up the routing tables a bit while not breaking the solution.

1

Adding an S3 endpoint in the private subnet resolved the issue.

It turns out that our problem was specific to accessing S3. Our setup at the time was:

  • NAT gateway running in public subnet
  • S3 endpoint in public subnet (with higher routing priority than the internet gateway)
  • A default rule for traffic in the private subnets to go through the NAT.

It appears that traffic was not getting routed through the NAT to S3 either through the public internet or through the S3 endpoint. I still do not know why.

JustinHK
  • 131
  • 5
0

First, the obvious: cloudwatch.s3.amazonaws.com is not one of the Cloudwatch endpoints.

The Cloudwatch endpoints are in the form monitoring.[aws-region].amazonaws.com.

For example, in the us-west-2 region, the endpoint is https://monitoring.us-west-2.amazonaws.com.

http://docs.aws.amazon.com/general/latest/gr/rande.html#cw_region

Also, even if your routing, NAT, or networking is otherwise misconfigured, DNS resolution is immune to many misconfigurations, because of the way it is implemented in VPC... so the fact that it works does not tell you whether you have Internet connectivity, in general.

Michael - sqlbot
  • 21,988
  • 1
  • 57
  • 81