2

In my company's AWS cloud we have 4 VPCs, one for each of our major API environments (dev, test, stage, prod). In order to make these environments as similar as possible to each other they all have their CIDR block set to 10.0.0.0/16.

Now a need has arisen for us to create an internal service shared between these environments. For the sake of argument, let's say that this new service stores log data from all of these environments. This service exists in its own VPC with a CIDR block of 10.1.1.0/24.

At first I thought I'd be able to simply add peering connections from all the environment VPCs into the logging VPC. I ran into a hurdle when I started setting up the Route Tables though. I made a route table from Dev -> Logging which routes all traffic with destination 10.1.1.0/24. But I still can't connect to my logging server from within dev. It seems I need to add a route table for Logging -> Dev which routes all traffic with destination 10.0.0.0/16. This allows me to connect to the logging server from a dev server, but now I can't connect any of my other environments to the logging VPC.

The logging server never has to initiated a connection with my API servers, it only needs to receive and respond to connections. So my next thought was that I could use a NAT Gateway on each of the environment VPCs and then route those to the logging VPC. Unfortunately it seems that NAT Gateways are connected directly to the internet, and I don't want my logging VPC to be connected to the internet.

I feel like there must be a way to make this work, but I can't think of one. At the moment I feel like my only option is to create 4 logging VPCs and run separate logging servers in each of them, but from a cost perspective this doesn't really appeal to me.

Joshua Walsh
  • 155
  • 10
  • 2
    Everything I've read doing the AWS qualifications says you can't peer VPCs with the same CIDR blocks. You may find some tricky solution with proxying or similar, but you're probably best off creating new VPCs without overlapping ranges and moving your instances and resources into them. – Tim Jul 19 '17 at 03:56

1 Answers1

6

First, I must mention: you made a very grave error by duplicating subnets in your VPCs. Even if there is next to zero chance you'll need to route traffic between them, the RFC1918 address space is well large enough for you to give each VPC a unique subnet. I consult with a number of companies on AWS topics, and I maintain a "master subnet list" spreadsheet to record subnets in use as I allocate VPCs for customers, to ensure I don't have overlapping subnets.

The obvious answer to your question is to re-number your overlapping VPCs. This is going to be painful, but it's the right answer to this problem, and will solve this issue for you once and for all.

If that's not an option, I can think of a couple of other options:

  1. Utilize SQS for your logs - send logs from your app VPCs to an SQS queue, annotated with a source ID for each of your app VPCs, and then consume the logs out of SQS from your logging VPC. In addition to solving your stated problem, this puts a very highly-available buffer between your logs producers and log consumers. This protects you from losing logs if your infrastructure hiccups, or if you need to take it down for maintenance.
  2. Expose your logging endpoint via a public IP (ELB, EIP, etc.), firewall it so that only your app servers' public IPs can hit it and have them send their logs this way. The traffic will remain on AWS's network, and as long as it's encrypted and authenticated, it's not much of a security issue. You'll pay more for bandwidth, though.
EEAA
  • 108,414
  • 18
  • 172
  • 242
  • Thanks, I suspected as much. Although I feel compelled to say that it was my predecessor who configured the VPCs, so it was not me personally than committed this sin. – Joshua Walsh Jul 19 '17 at 04:40