1

My DNS registrar and DNS provider recently had a long outage, rendering all my domains unusable (email, own+client websites etc).

They have 3 DNS server, who are all in the same co-hosting facility!

I know just enough about networking to make my spidey-sense supertingle, but not enough to condemn this. Is that not an atrocious design?

Should they not have been spread across lines, networks - even continents?

enter image description here

(Source: https://help.hover.com/hc/en-us/community/posts/115007805527-After-recent-outage-what-are-you-going-to-do-to-fix-your-network-design-problems-)

Kjensen
  • 1,009
  • 9
  • 28
  • 39
  • 3
    In an idea world: Yes (of course, ideally you want have *everything* to be super-redundant and failsafe...). And then they have a problem affecting every location at the same time because of a software error ... It also depends on what this providers primary business is. If they are just selling domains, this would be much more relevant then when they are primarily a hoster. – Sven Oct 01 '17 at 09:59
  • 2
    I think one could simply point to established best practices (such as good old [BCP16 aka RFC2182](https://tools.ietf.org/html/rfc2182#section-3.1))? Assuming that the claims in the question are factually correct, I don't think it's opinion-based whether a domain name registrar (this being their primary business in this case) which also provides DNS hosting services at large scale ought to run these services in a fashion that at least lives up to age-old best practices. – Håkan Lindqvist Oct 01 '17 at 12:54
  • 1
    (Of course, the overall argument applies whether the claims are correct or not, it's just whether this provider failed in this regard and hence whether this is all relevant to the particular situation that will vary) – Håkan Lindqvist Oct 01 '17 at 13:04
  • 1
    @Sven Would you consider the above argument reasonable? – Håkan Lindqvist Oct 01 '17 at 13:04
  • 1
    @HåkanLindqvist: Yes :) – Sven Oct 01 '17 at 15:01
  • 1
    "Yes". See the linked RFC above as well. But besides this: What's the question here? – gxx Oct 01 '17 at 15:09

2 Answers2

5

Do not put weight in geo-ip registrations, just because a service like hover (probably a bad example) or cloudflare (perfect example) have a small list of ip addresses does not indicate scale.

8.8.8.8 for example, is advertised in bgp via anycast to many points of presence (PoP), while to you that's a single IP and therefore a single point of failure, that does not indicate the whole story.

Looking into these IPs specifically using lg.he.net hover doesn't do this.

To answer, yes they should've, no they didn't, but having 3 nameservers listed isn't necessarily the problem.

Also, Google has 4 nameservers, each in it's own /24 anycast wrapped in a /23 unicast for network failback.

Here's an example of Google's first nameserver, ns1.google.com google looking glass

Now let's look at ns1.hover.com enter image description here

Ouch, not great, hoover may have (2) routes to one network, while google likely has multiple routes to multiple PoPs with the same advertised IP.

I'd suggest looking into cloudflare, NS1 or one of the many others... Multi-Vendor and/or run your own slaves if the zone is actually important to you.

Jacob Evans
  • 7,636
  • 3
  • 25
  • 55
3

Without going into the specifics of this particular operator's setup (which I'm not familiar with), the answer to the general question is clear.

DNS has a long history of design with redundancy in mind (the protocol has built-in facilities for synchronizing zone data between servers, multiple authoritative nameservers are natively supported by simply adding multiple NS records, most registries outright require at least two nameservers when delegating your registered domain name, etc, etc).

It's also long-established best practice to have diversity among your authoritative nameservers, both regarding geographical location as well as network topology.

An example of this is RFC2181 - Selection and Operation of Secondary DNS Servers (aka BCP16 since receiving Best Current Practice status), a document from 1997 specifically on this subject.

The section on Selecting Secondary Servers (ie, what the full set of authoritative nameservers should be like) in this document reads:

3.1. Selecting Secondary Servers

When selecting secondary servers, attention should be given to the various likely failure modes. Servers should be placed so that it is likely that at least one server will be available to all significant parts of the Internet, for any likely failure.

Consequently, placing all servers at the local site, while easy to arrange, and easy to manage, is not a good policy. Should a single link fail, or there be a site, or perhaps even building, or room, power failure, such a configuration can lead to all servers being disconnected from the Internet.

Secondary servers must be placed at both topologically and geographically dispersed locations on the Internet, to minimise the likelihood of a single failure disabling all of them.

That is, secondary servers should be at geographically distant locations, so it is unlikely that events like power loss, etc, will disrupt all of them simultaneously. They should also be connected to the net via quite diverse paths. This means that the failure of any one link, or of routing within some segment of the network (such as a service provider) will not make all of the servers unreachable.


The above are best practices for DNS deployments in general. Obviously one will have to adjust expectations somewhat based on the situation, but when it comes to a large scale deployment operated by a company which has these services as part of their core business the above really just makes sense.
Håkan Lindqvist
  • 33,741
  • 5
  • 65
  • 90
  • 1
    AS diversity is often forgotten. Even with geographic and network diversity if all your servers are in the same AS you depend on its visibility and hence your upstreams are a spof – Patrick Mevzek Oct 02 '17 at 19:36