According to RFC 1035 CNAME records cannot be used used in a zone apex because a CNAME record cannot exist alongside records such as SOA.
This is frequently a source of frustration for those trying to design SaaS applications in order to support custom domains while still being highly available and scalable.
The options are basically:
- Live with a fixed set of IP addresses
- Insist users use your DNS servers so you can update the records
- Insist everyone uses www. with a CNAME and let users figure out how to redirect their naked domain
After surveying a number of large SaaS products it seems like most support option 1, although this seems like this is the most difficult option when anticipating rapid growth and achieving high availability.
For example:
- GitHub: Two IPs for A records
- SquareSpace: Four IPs for A records, although I think until recently this was only one
- BigCommerce: Recommended to use their DNS, but each store does have a specified IP subject to change
- Shopify: A single IP for A records
- WordPress.com: Must use their DNS
Generally, having a fixed set of IPs introduces the following challenges:
- You can't easily remove an IP address when one of your load balancers fails
- You can't add IPs as demand grows and you need to distribute your traffic among more load balancers
- You can't fail over to geographically isolated infrastructure with an entirely different set of IPs
My question is, specifically how do providers who offer a fixed set of A records overcome these limitations? Especially those who only offer a single A record?
Do they:
- Have one load balancer per IP they provide, hope they never exceed the capacity of the largest servers they can run in this role and in the event of a fail over just move the IP to a different server?
- User hardware load balancers which are presumably more reliable and have enough capacity that it's unlikely to ever be a bottleneck?
- Use something like IP anycast where an arbitrary number of load balancers can receive packets for a certain IP?
- A mix of the above?
I'm guessing those running on public clouds (eg. AWS, Azure, etc.) would be the first, and those who have their own infrastructure would be mostly the second?