How 8.8.8.8 is kept *always* alive?

I know how you can manage datacenter redundancy if there's working DNS server that can point to any working site of your company - there's VRRP, multi WAN etc etc. But how DNS servers itself are kept online? It's first hit when someone connects to service and it can't really be provisioned. I mean for example 8.8.8.8 or 8.8.4.4. I can't recall them being down. Ever. How do ISPs manage to keep such IPs always online?

I know it's probably really broad question but I'd like to hear just names of protocols / techniques that can be used for that. I can read details about them on my own.

Lapsio

Posted 2017-05-20T16:18:08.040

Reputation: 640

3Read up on Anycast. Short: there are multiple hosts with the same IP address. That's how CloudFlare, Google, YouTube and other big networks work. – GiantTree – 2017-05-20T16:26:23.670

google.com and cloudflare have multiple IPs. Various IPs are returned in DNS query depending on location etc. But 8.8.8.8 is actually single IP. And it can't use "multiple A records" or other DNS based reduncancy because it's DNS itself. Can you have multiple sites / hosts under single IP? They use something like multi ISP BGP? – Lapsio – 2017-05-20T16:42:00.310

2It’s Anycast, like GiantTree wrote. Anycast does not involve DNS. – Daniel B – 2017-05-20T16:44:21.917

IPv4 doesn't support anycast natively. According to wikipedia it seems to be realized using BGP if I understand it correctly. https://en.wikipedia.org/wiki/Anycast

– Lapsio – 2017-05-20T16:48:04.570

For datagram services, special support for anycast isn't needed – it just happens as a result of each router doing its own shortest-path route computations. BGP does not "support" anycast natively either (it sees those as unicast routes), and yet it is a common way to do it Internet-wide. – user1686 – 2017-05-20T18:04:52.300

Answers

First of all, VRRP does not depend on DNS in any way. For redundancy within a single site you can run DNS servers on a shared VRRP address just fine.

But as others have mentioned in comments, the services also use anycast routing, which essentially means that the same IP address exists in multiple places around the world. When a whole site goes down, routes world-wide are recalculated so that your packets end up going to another working site.

A better example than Google's public DNS would be the root DNS servers – the ones which serve the . zone and hold pointers to com, org, eu, and so on – because they have a map of every instance of the 13 logical addresses. ICANN's "L" is served by 160 different sites!

Note that anycast has nothing to do with DNS-based round-robins (where the same name has multiple addresses). Anycast is done essentially by lying to the routing protocol.

The Internet uses BGP to exchange routing information between organizations.

BGP inherently supports selecting the best out of several routes towards the same network, based on various criteria. For example, the same customer might have redundant uplinks to the same ISP (announcing two routes differing only in weight/preference). Or the customer might have uplinks through several ISPs, and everyone will select their preferred route (mainly shortest AS-path) – that's the gist of "true" multi-WAN.

Multihoming

                  ┌────────[AS 65535]────────┐
client 1 ---ISP---│--BGProuter--+            │
             ¦    │             ¦--DNSserver │
client 2 ---ISP---│--BGProuter--+            │
                  └──────────────────────────┘

However, BGP only leads the traffic to your entrance doors but does not care what happens beyond that. So if you internally set up both routes towards the same server, you get multihoming. But if each "entrance" leads to a different server (configured for the same IP), you get anycast.

Anycast... kind of?

                  ┌────────[AS 65535]────────┐
client 1 ---ISP---│--BGProuter-----DNSserver │
             ¦    │                          │
client 2 ---ISP---│--BGProuter-----DNSserver │
                  └──────────────────────────┘

Importantly, this also means that BGP doesn't care if the AS isn't contiguous at all. To get world-wide redundancy, just announce the same network from multiple physical locations – if you connect those locations together (so that they route that network to one place), you get multihoming; if they're islands, you get anycast.

Anycast

                  ┌────────[AS 65535]────────┐
client 1 ---ISP---│--BGProuter-----DNSserver │
             ¦    └──────────────────────────┘
             ¦
             ¦    ┌────────[AS 65535]────────┐
client 2 ---ISP---│--BGProuter-----DNSserver │
                  └──────────────────────────┘

(For that matter, it doesn't even need to be the same AS – e.g. 6to4 relays are run by multiple independent organizations, each of them announcing their own route towards 192.88.99.0/24.)

Caveats:

Anycast provides redundancy, but not load-balancing. Once BGP converges, each router will have chosen a single preferred route (or occassionally a few) and will continue using it until the network changes.
However, you cannot predict how long the routes will remain stable, so anycasting stateful services can be tricky. DNS gets away with it due to being stateless and using mainly UDP (EDNS reduced the need for TCP connections).
There must be coordination between the actual service and BGP router, so that the route is withdrawn if the service crashes.

See also "History of 4.2.2.2. What's the story?" on NANOG mailing list: post 1, post 2.

user1686

Posted 2017-05-20T16:18:08.040

Reputation: 283 655

"How to have your answer accepted in less than 60 seconds with this one weird trick" – user1686 – 2017-05-20T19:01:09.743

What are "islands" you refer to in before-last paragraph? Just not-connected sites? – Lapsio – 2017-05-20T19:03:12.270

Yes – parts of your network that aren't interconnected with each other or the rest. (Although that's just an example. It's possible to implement internal anycast inside one big interconnected network, too – again by tricking routing protocols.) – user1686 – 2017-05-20T19:14:01.783

One way to achieve that is using server-side balancers. When you connect to the gateway at the IP 8.8.8.8 it'll distribute the request to one free server inside the system. As a result when one server dies it doesn't bring down the whole system.

For Internet services, server-side load balancer is usually a software program that is listening on the port where external clients connect to access services. The load balancer forwards requests to one of the "backend" servers, which usually replies to the load balancer. This allows the load balancer to reply to the client without the client ever knowing about the internal separation of functions. It also prevents clients from contacting back-end servers directly, which may have security benefits by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated services running on other ports.

Some load balancers provide a mechanism for doing something special in the event that all backend servers are unavailable. This might include forwarding to a backup load balancer, or displaying a message regarding the outage.

It is also important that the load balancer itself does not become a single point of failure. Usually load balancers are implemented in high-availability pairs which may also replicate session persistence data if required by the specific application.[5]

phuclv

Posted 2017-05-20T16:18:08.040

Reputation: 14 930

Yeah but load balancers are not single point of failure only if they use some other high availability technique like for example VRRP, routing protocols etc. But then again VRRP or IGP are rather LAN solutions. So I mean lets say that ISP boarder WAN connection to datacenter fails. Company of course has multi WAN so as long as site gateway can switch to different WAN link it's okay but keeeping the same IP remains problem. In case when DNS is available it's okay - multiple A or AAAA recods and done. But when it's DNS server itself then only solution is anycast / BGP between multiple ISPs. – Lapsio – 2017-05-20T18:03:11.537

I was rather referring to WAN high-availability solutions after gateway. When whole company site is unreachable from world due to ISP disaster. 8.8.8.8 can't assume ISP will work. You can't rely on single company when literally whole world relies on your service – Lapsio – 2017-05-20T18:06:36.443