What is a good subnet strategy for AWS VPC?

Question

I currently have each VPC per cluster (stg, prd, tst, misc) and the standard clusters (stg, prd) have these subnets:

elb: for public elb(s) that will received direct public traffic
elb-int: for internal elb(s) that will received service to service comm
svc: for application service
db: for database
dmz: for nat gateway(s), proxy, etc

VPC (stg, prd)
├10.100.0.0/16 az-1
|  ├10.100.0.0/20  elb
|  ├10.100.16.0/20 elb-int
|  ├10.100.32.0/20 svc
|  ├10.100.48.0/20 svc
|  ├10.100.64.0/20 db
|  ├10.100.80.0/20 dmz
|  ├10.100.96.0/20 <reserved>
|  ├ ...
|  └10.1-0.240.0/20 <reserved>
├10.101.0.0/16 az-2
|  ├10.101.0.0/20  elb
|  ├10.101.16.0/20 elb-int
|  ├10.101.32.0/20 svc
|  ├10.101.48.0/20 svc
|  ├10.101.64.0/20 db
|  ├10.101.80.0/20 dmz
|  ├10.101.96.0/20 <reserved>
|  ├ ...
|  └10.101.240.0/20 <reserved>
└10.102.0.0/16 az-3
   ├10.102.0.0/20  elb
   ├10.102.16.0/20 elb-int
   ├10.102.32.0/20 svc
   ├10.102.48.0/20 svc
   ├10.102.64.0/20 db
   ├10.102.80.0/20 dmz
   ├10.102.96.0/20 <reserved>
   ├ ...
   └10.102.240.0/20 <reserved>

I know this question is broad, like "it depends on the situation" kinda question. But I've searched the internet and found no sensible guideline on this.

So I asked this question to find out how sysadmins choose a strategy for their subnet(s). Please share yours, and, if you can, place a small statement explaining why you choose that approach.

Unfortunately I think your question is too broad for server fault, as you don't have a problem you're trying to solve. I suggest a good starting point is the [AWS Landing Zone](https://aws.amazon.com/answers/aws-landing-zone/), though don't underestimate the complexity or effort of that solution. Are you using subnets to define tiers? It's a slightly old but effective way of doing things, security groups work fine. A /20 subnet is huge, you could use a /27 for most, but I guess if you don't have on-premise integration to worry about use as much IP space as you need. — Tim, Oct 12 '18 at 07:23
I googled here and there but found no sensible guideline on how to architect subnets. I've asked some friends and this seems like a matter of preference. Personally, I think this is much more than personal preference, so I brought this up. — rocketspacer, Oct 12 '18 at 07:53
@Tim once you start using e.g. Fargate or VPC-enabled Lambdas at scale where each task needs its own IP you may soon find /27 too small. Just something to consider :) — MLu, Oct 12 '18 at 07:57
I think your subnet architecture is fine. They don't add much in AWS, and I don't think they help security at all, but if they help with understanding or organisation they don't hurt either. @MLu yes a /27 might be a touch too small in some cases particularly at scale, it was just an example. A /24 is probably a good default size. I tend to have variable sized subnets based on the need, from /28 to /23 mostly, but smaller than /24 you really need to be careful. My current project we have AWS as an extension of the data center so are using limited IP space, otherwise it wouldn't matter. — Tim, Oct 12 '18 at 08:01
I've worked at an enterprise with over 3000 ec2 instances in one account, wasn't a sysop back then though. That explains why I wrote /20. I'm writting terraform scripts so I can reproduce infrastructures in minutes and benefit my career. The subnet size can be parameterized based on the size of the organization however I see fit, so yeah there are use-cases for /20. — rocketspacer, Oct 12 '18 at 08:29
The benefit of this is that I can use NACL to restrict all traffic from the elb-* subnets to the db subnet, packets must flow through layers — rocketspacer, Oct 12 '18 at 08:29
And this guide suggests using large subnets https://aws.amazon.com/blogs/startups/practical-vpc-design/ — rocketspacer, Oct 12 '18 at 10:56
I didn't say you were wrong about subnets :) I agree that if you want to use NACLs in addition to security groups, which not everyone does, then yes subnets have a benefit. We use NACLs in enterprise deployments as well as SGs. Having 3000 instances in one account is a risk though, your blast radius is huge if something goes wrong, and you risk hitting AWS API limits which are per-account - especially if you use encrypted volumes. With 3000 instances they would probably be best in multiple accounts to reduce risk. — Tim, Oct 12 '18 at 18:04
Hi Rocketspacer, if the response below answered your question please upvote and accept it. That's the ServerFault's way to say thank you for the time and effort someone took to help you. Thanks! — MLu, Nov 03 '18 at 03:53

score 5 · Answer 1 · edited Jun 11 '20 at 10:02

I'm afraid ServerFault isn't a place for conducting surveys or soliciting opinion-based answers.

Anyway your setup seems to be way over-complicated.

Because in AWS security and firewalling is done predominantly using Security Groups it doesn't really matter if you've got 6 subnet layers like you describe in the question or just 2 per VPC - Public and Private.

Resources in the Public subnets have public/elastic IPs and can be accessed from the internet, if SG rules permit

For example - public ELB/ALB, jump hosts, etc
Resources in the Private subnets can't be accessed from outside and use NAT to talk out

For example - RDS clusters, ECS clusters, web servers (hidden behind ELB), etc.
Optionally you can have Private subnets without internet access - that's sometimes used for databases (RDS) but almost as often they are simply put into the normal Private subnets.

Of course your Public and Private subnet layers should span across a few AZs to achieve high availability but don't go overboard. Use 2 or 3 AZs max, that's usually enough even if in some regions you can have a lot more.

Technically of course you can't span a subnet across AZs but you can have priv-a 172.31.0.0/24 in AZ "a" and priv-b 172.31.1.0/24 in AZ "b" and deploy ELBs and ASGs across both and treat it like one.

Note that all the above applies per VPC - typically you'll have multiple VPCs, eg. one per stage (dev, test, ..) and even multiple AWS accounts per project (e.g. dev and prod) for a greater separation between production and development / testing workloads.

None of this these are hard rules of course. Some clients require more subnet layers or more AZs per VPC but those are exceptions.

For the majority of VPCs the Public + Private subnets across 3 AZs are perfectly fine.

And remember - Security Groups are your friends :)

I slightly disagree. I do AWS architecture for very large corporates, the typical architecture for them is significantly more complex than this. I agree that this isn't the best place for architectural advice, but I'll add some thoughts later. — Tim, Oct 12 '18 at 05:48
@Tim thanks, I updated the answer to make it clear that this is *per-VPC* structure. Enterprise deployments will of course have multiple VPCs and multiple AWS accounts, but very seldom I see the need for having 3+ subnet layers *per VPC*. Hope that clarifies it :) — MLu, Oct 12 '18 at 05:58
Building multiple VPCs with similar infrastructure is one approach, probably a fairly good approach. Some enterprises have shared infrastructure in one or more accounts, largely to meet security or compliance requirements. The [AWS landing zone pattern](https://www.youtube.com/watch?v=RSv9H59AsoI) helps with this. It really makes things more complex though, and with complexity comes the cost of people to understand and design things, which can cost more than the infrastructure... — Tim, Oct 12 '18 at 07:40
One you start doing multi-account the [Transit VPC pattern](https://aws.amazon.com/answers/networking/aws-global-transit-network/) becomes relevant, which helps achieve enterprise goals but increases complexity and costs. — Tim, Oct 12 '18 at 07:40

What is a good subnet strategy for AWS VPC?

1 Answers1