I am trying to setup Traefik on a production site, and I'm struggling with some high availability issues. I think we still need a reverse-proxy in front of the Traefik cluster. Here are the potential setups that I've considered, and why the reverse-proxy seems to be needed:
Setup DNS A records to point to each of the Traefik nodes for load balancing and failover.
This practice is discouraged according to multiple sites including this SO question and this SF question.
Even using a service like DNSMadeEasy seems to be discouraged due to DNS caching and TTL issues.
Point one DNS record to one of the nodes running Traefik.
That node becomes a SPOF. My nodes are running on CoreOS, which reboots after every update, so we would be guaranteed to have a few minutes of downtime each week.
We could move the DNS record to an alternate node whenever downtime is expected. This would be a pain to manage manually. I can envision a solution paired with locksmithd that handles this automatically, but I don't really want to build it and it wouldn't handle unexpected downtime.
Part of the rationale for using Docker Swarm (or Kubernetes) is to make nodes interchangeable.
Put a load-balancer/reverse-proxy in front of the Traefik cluster. The reverse-proxy can provide failover between all the Traefik nodes, and DNS would point to the reverse-proxy.
- Yes, this is a SPOF, but in my experience, it is pretty easy to get good uptime with this setup. If occasional maintenance is required, the DNS record can be pointed to a new proxy.
Am I missing something or over thinking this?