0

I would like to ask for some clarification on the following situation / Scenario.

Let's assume I have a Domain, whose DNS A Records are registered by means of 2 Pairs (4 in total) of Name Servers; each Name Servers Pair is hosted within a different datacenter, for redundancy purposes.

In case of loss of connectivity towards one of the Pairs of Name Servers (due to whatever reason, like Internet Firewall outage, or data center outage), and assuming that the other pair of Name Servers hosted in the second location would still be online and accessible from the internet, would this affect the DNS resolution process for external clients and, if so, to what extent ?

My understanding is that the Recursive DNS Servers would be querying the Root Servers and all 4 NS Records for the zone would be initially returned. What would the Recursive DNS Servers than do during the next step if 2 of the NS could not be reached and the query for the A Record sent to those Servers were to time out ?

Could the resolution Process get significantly affected or even technically fail on the client side ?

Or would the Recursive DNS Resolver just proceed through the NS list and query the remaining Name Servers which are online, eventually successfully resolving the DNS Record ?

Thanks for clarifying.

Greg Askew
  • 34,339
  • 3
  • 52
  • 81
Ottootto
  • 11
  • 1
  • 5
  • The situation you depict can of course happen temporarily and it is fine, the DNS service will continue to work, albeit with some possible delays. However it shouldn't remain like that "forever", because it is then similar to a lame delegation and it is not playing nice on the Internet with others. It costs resources to everyone. – Patrick Mevzek Jun 02 '20 at 21:06

1 Answers1

2

Details of this behavior will be implementation specific, but the overall idea is that the resolver server will try all the authoritative servers until it gets a response.

One factor here is that a typical resolver implementation has some means of tracking how responsive the authoritative servers it has been in contact with were, and will to some extent latch on to the subset of nameservers that have been "reasonably responsive".

The result is that the handling of an initial query may be slow if some of the authoritative nameservers for a zone are unresponsive (this could be noticeable, or even cause an error if it ends up taking very long time), when the resolver server doesn't yet know about the unresponsive nature of your downed authoritative nameserver. However, this will quickly sort itself out as it "learns" its responsiveness (or lack thereof), and the "average behavior" should be largely normal as long as things are relatively stable (read: unchanging, not necessarily good).

Håkan Lindqvist
  • 33,741
  • 5
  • 65
  • 90
  • 1
    A good algorithm to select the nameserver to query, typically based on its RTT (Round Trip Time) should at some point converge BUT needs, regularly but kind of randomly, again try nameservers that were listed not as best ones, because they suddenly can become better. So basically you shouldn't stick 100% to whatever results you have, you need to retry, from time to time, even nameservers that you think were bad in the past. See https://www.nanog.org/meetings/nanog54/presentations/Tuesday/Yu.pdf for various examples on what resolvers are doing on this topic. – Patrick Mevzek Jun 02 '20 at 21:08
  • @PatrickMevzek Thanks for the additional details. Indeed, it's important to regularly retry the "bad" nameservers as what is good/bad evolves over time. And while that presentation is a bit old (and therefore tests old versions of popular resolvers), it's still an interesting summary of what some real implementations behave(d) like. – Håkan Lindqvist Jun 02 '20 at 21:39
  • Yes,this is the best ("only") useful content I have found (https://securityintelligence.com/subverting-binds-srtt-algorithm-derandomizing-ns-selection/ details a little what Bind does, but certainly not as good as its sources), and would be very happy to see other references on the subject... as I had to implement such kind of algorithm recently (outside a nameserver). I had to decide to handle the transport (TCP/UDP)and not just RTT because I found some bad nameservers where only TCP was working and not UDP (funny as it is more often the opposite) and I wanted to converge to TCP over time. – Patrick Mevzek Jun 02 '20 at 21:47
  • I would like to thank all of you for your replies. That helped me grasping the concept better. – Ottootto Jun 03 '20 at 05:58