5

This may be a Windows DNS specific question or a general DNS best practice question - I'm not sure!

We migrated our 3rd party DNS provision from provider A to provider B.

I noticed that our internal recursive windows DNS servers still had NS records cached for our domains pointing to provider A's servers, even though I changed the nameservers with our registrar several days ago, and even though selecting the properties of the cached records showed a TTL of 1 day.

After 24 hours when the NS records in this cache have expired, will the DNS server go back to the TLD server for an update on the authority, or will it go by preference to dns1.providera.com since that is what it has cached?

In this case I arranged to leave Provider A's servers up for a week to allow changes to propagate, so dns1.providera.com is still active and would still provide NS and SOA records that said that dns1.providera.com. was in charge of this domain. Given this fact, would the Windows DNS server ever go back to the TLD and pick up the authority changes, or would it just assume all was well and renew timestamps on its cached NS records?

I wonder what would be the best approach to ensuring that caches pick this up. Should I:-

(1) Leave Provider A's servers in place and active and wait for caches to catch up ... basically what we're doing now which seems to have issues - perhaps specifically for Windows servers, or perhaps more widely. (2) Leave Provider A's servers in place but change the NS and/or SOA information they provide to tell caches that new servers are in charge. (3) Remove Provider A's servers after 2*TTL to force remaining caches to update.

The issue with (2) is that on Provider A's system I can't seem to change the NS or SOA information to anything other than their servers.

The issue with (3) is that I'm not sure how a DNS server would behave in this case. When it couldn't reach the cached name servers, would it flush its cache and try a full recursive lookup, or would it just return an error, forcing the user to clear the cache manually?

Thanks in advance!

JohnCC
  • 292
  • 1
  • 6
  • 14
  • Did it finally flush after 24 hours ? – Sandman4 Jun 03 '12 at 21:09
  • I don't know - on the servers under my control (the Windows Domain Controllers on our LAN) I'd already flushed the caches to fix the issue. I wish I'd kept one in its original state for testing! – JohnCC Jun 04 '12 at 12:09

2 Answers2

2

The general architecture/flow of these updates is:

  • After you update the records with your registrar, they will update the registry database, which will in-turn update the stealth primary TLD NS.

  • Updates will flow from the stealth primary to the secondary servers that actually reply to queries. This happens in TLD SOA refresh time period, unless there are failures in which case the TLD SOA expire time period starts ticking.

  • If everything is hunky dory on their end, these updates propagate in a maximum of TLD SOA refresh and your updated record appears on the public facing TLD nameservers.

  • If you have queried before the updated record appeared on the public facing TLD nameservers, then you'll have to wait for the record's TTL to expire before you'll get the updated record.

In conclusion:

  • If all systems are go, then you only need to wait for a maximum time of TLD SOA refresh.

  • If you made your query via your caching/recursive too early, you may need to wait for TLD SOA refresh + record TTL

  • If there is an outage then you may need to wait for longer.

  • If systems come back at the last moment possible after an outage, you shouldn't need to wait for longer than TLD SOA expire + record TTL. This is accounting for the fact that you made the query before updated records got published to public facing TLD nameservers.

  • Because most caching/recursive servers will cache your zone's records as well, and your (enterprise?) DNS provider is in all likeliness going to have secondary servers as well, you'll have to add your SOA refresh before you start seeing changed to your own zone come through the new servers. Of-course, as I've done before, you could update both old and new servers for your own zone.

What you could do:

  • You could use a tool like dig or nslookup to query the public facing TLD nameservers directly to find out if your records have updated. You will also come to know the SOA temporal values of your TLD.

  • You could use the same tools to query your new DNS provider's secondary servers to find out if they have picked up the change.

  • Do a full recursive query via public nameservers (they can choose to ignore doing it recursively, but most don't) to see if the new query chain is working well.

  • Do a full recursive query locally from your client workstation. Dig allows you to do this and will assist you in determining if your resolution chain is bound as expected.

  • DNS can get daunting. Write comments to my response so I can make it more comprehensible. I'll look at it later in the day to see if I can improve on what I've written.

nearora
  • 445
  • 2
  • 8
  • SOA refresh is usually not relevant today. At least verisign (.com and .net TLDs) currently implements what they call "rapid DNS updates" - any changes you make with registrar pushed instantly to the TLD servers. – Sandman4 Jun 04 '12 at 06:31
  • The question was about **changing** DNS hosting, thus negative caching does not apply - either **old** record set present, or the **new** one – Sandman4 Jun 04 '12 at 06:32
  • @Sandman4, you are right - in that case only the TTL will apply. The OP did not say what TLD is the parent, so I don't think we can assume the availability of rapid updates. – nearora Jun 04 '12 at 06:35
  • If otherwise stated, we can safely assume it's .com :) – Sandman4 Jun 04 '12 at 06:43
  • @Sandman4, I don't think that resonates with the spirit of "This question asks about potential problem which may be experienced by everyone who is switching DNS hosting." – nearora Jun 04 '12 at 06:47
  • Was just kidding about .com, sorry – Sandman4 Jun 04 '12 at 07:12
-1

Not really an answer, just a few thoughts.

The issue with (3) is that I'm not sure how a DNS server would behave in this case. When it couldn't reach the cached name servers, would it flush its cache and try a full recursive lookup, or would it just return an error, forcing the user to clear the cache manually?

BCP 123, section 2.1.1, recommends that the caching DNS server should just return an error in this case. (BTW, what do you mean by "user manually clearing the cache" ? User can not do a thing. DNS server operator can, and of course admins will not care and will not even know about your server unreachable (unless it is authoritative for google.com).), thus (3) is not a very good option.

As for

After 24 hours when the NS records in this cache have expired, will the DNS server go back to the TLD server for an update on the authority

I was about to jump and say "of course it will query TLD", simply because otherwise nobody will never be able to change his DNS hosting. Though, at second thought there are a few possibilities:

If some caching DNS server received no queries for your domain until NS TTL expires, and then after NS TTL expired it receives a query, caching server must not use those expired NS and have no other choice other than querying TLD. However, if some caching DNS server constantly queried for your domain, your old server with each query may send a copy of NS records (in authority section), caching servers may well store the records for their new TTL, and thus TTL technically may never expire (?!)

I suppose that TLD will be queried simply because I can't believe that such a common problem have no solution. My hope is that wizards here will enlighten us... @Alnitak.

Sandman4
  • 4,045
  • 2
  • 20
  • 27
  • Actually I put a bounty on this because of what appears **bold** in my answer. Please someone, comment. – Sandman4 Jun 04 '12 at 06:40
  • The scenario you specify above is only possible if records that form the chain to your zone's NS records never expire or change either. That is however not the case and queries go out to root nameservers on expiry of records that delegate out to TLDs and so on to nameservers of child zones as well. This is assuming that the software you are using isn't broken! – nearora Jun 04 '12 at 06:56
  • nearora - you seem not to get my point, please read the bold carefully – Sandman4 Jun 04 '12 at 07:14
  • I think I do Sandman4, but that is not how caching DNS servers are supposed to work! There is a TTL applicable to each record in the chain. If the software for a caching/recursive nameserver is broken, or if it's configured to override TTLs to preserve bandwidth, there is not much you can do. But TTLs are to expire and queries should go out regularly to re-get records for each part of the chain. DNS isn't fool proof and is distributed. Which is why we have DNSSEC with client side recursive verification. DNSSEC is also vulnerable in the last mile should the recursive/caching be compromised. – nearora Jun 04 '12 at 07:23
  • Idea is - I changed nameservers for my domain from **oldserver.example.com** to **newserver.example.com**, and the TLD is well updated. **But** the oldserver is out of my control and it still have `@ 1000000 NS oldserver.example.com`. Now, some caching DNS server, let's say 8.8.8.8, queried for A record of www.example.com., caching have cached NS records for example.com, and it's `oldserver.example.com` and TTL will expire within one hour, caching sends the query there. oldserver replies, and with reply it **includes NS records for example.com**, which is still, oldserver.example.com. – Sandman4 Jun 04 '12 at 07:24
  • (continued) now 8.8.8.8 got a new set of NS records for example.com, and it supposed to keep them for TTL long, before that new TTL expires _new_ query for www.example.com arrives, and the process repeated. - Nothing is broken, yet caching server have no reason to go to TLD. – Sandman4 Jun 04 '12 at 07:26
  • From what you are saying, com had an NS record for example.com pointing to oldserver.example.com with reasonable TTL values. It also had glue records pointing to IP of oldserver.example.com. You changed records at this level to point to newserver.example.com with the same reasonable TTL. However, oldserver.example.com is under rogue control and has very long TTL and you are worried that you may be unable to move away from oldserver.example.com as your DNS provider. (contd...) – nearora Jun 04 '12 at 08:10
  • RFC 1035 states that: "The resolver may encounter a situation where no addresses are available for any of the name servers named in SLIST, and where the servers in the list are precisely those which would normally be used to look up their own addresses. This situation typically occurs when the glue address RRs have a smaller TTL than the NS RRs marking delegation, or when the resolver caches the result of a NS search. The resolver should detect this condition and restart the search at the next ancestor zone, or alternatively at the root." (contd...) – nearora Jun 04 '12 at 08:10
  • "As an optional step, check the TTLs of arriving data looking for RRs with excessively long TTLs. If a RR has an excessively long TTL, say greater than 1 week, either discard the whole response, or limit all TTLs in the response to 1 week." In addition to the checks afforded offered by the distributed nature of DNS, you are also benefited by ICANN and IANA who regulate root, gTLD and enforce compliance in association with government appointed bodies like RIPE NCC, Nominet and AusRegistry for ccTLDs. A rogue operator, such as one you fear is going to be put out of business soon enough! – nearora Jun 04 '12 at 08:10
  • 1
    In my opinion, your question warrants a new post/question. It's related the OP's question, but is a very specific case and deals with DNS security. – nearora Jun 04 '12 at 08:12