7

I am currently migrating a DNS zone from one DNS server provider to another. I am trying to estimate how long it will take for the change to propagate, and to understand what the delay might be if I chose to rollback mid-stream.

Previously, I thought I could do:

dig example.com ns

To see what the remaining TTL on the NS record was, but now I understand that this NS record is the NS record for subdomains in the zone, and not the NS record that emanates from the root servers, which is the one that ultimately determines to which name server the query will be sent.

I tested this by setting up a test record in the zone in each of the providers:

Provider1 test.example.com 10.0.0.1
Provider2 test.example.com 192.168.0.1

For both providers, the TTL on the NS records in 0, while the NS records at the TLD Registrar level point to the name servers of Provider1.

When I change the NS records in the zone at Provider1, I can see this reflected in NS queries almost immediately (using 'dig example.com ns').

However, when I send a query for an A record, ie

test.example.com

it always returns

10.0.0.1

regardless of what the NS records in the zone at Provider 1 are set to.

On this basis, I've concluded that the NS records within the zone file are irrelevant to the migration, and that only the name servers records at the TLD level are important.

However, I can't get a read on how long it is likely for a change there to propagate, either forward or back.

Is it possible to query what TTL are am working with for records emanating from the TLD root servers?

Garreth McDaid
  • 3,399
  • 26
  • 41

3 Answers3

6

On this basis, I've concluded that the NS records within the zone file are irrelevant to the migration, and that only the name servers records at the TLD level are important.

This is an incorrect hypothesis, but an easy mistake to make. You can read a little more on the subject of apex NS records here. The short version is that both matter, and the one being used will differ depending on whether a caching DNS server has previously queried your domain or has not.

Keep in mind that most recursive DNS servers enforce a minimum TTL, so any conclusions drawn from experiments with a TTL of zero are almost certainly inaccurate. The only case where this is not so is when you control the minimum TTL policy used by the server you're querying.

I am currently migrating a DNS zone from one DNS server provider to another. I am trying to estimate how long it will take for the change to propagate, and to understand what the delay might be if I chose to rollback mid-stream.

I'm going to focus on this topic since you have a few false starts in the rest of your question. First, it's important to remember that TTLs are cached on recursive DNS servers. Since everyone on the internet is using different recursive DNS servers, the only assumption you can make is that it will take up to n seconds, with n being the value of the TTLs.

This brings us to which TTLs are relevant here:

  • TTLs for individual records that are in cache. Even if the NS records expire, requests for individual records that are in cache will not automatically expire. Example: If test.example.com IN A expires ten minutes from now, but example.com IN NS expires five minutes from now, test.example.com will remain in cache even after the NS records have changed. Any problems related to the value of this record on the new servers will not become evident until the record has expired and has been refreshed.
  • TTLs for the NS glue records served by the TLD DNS servers. These are used by recursive servers that needed to obtain information about your domain when it is first requested. These TTLs influence how long it is before the DNS servers listed in your zone file are used for the next refresh. (refer to the Q&A I linked above for clarification on this)
  • TTLs for the NS records listed at the top of your zone file. Once the glue record TTLs have expired, these TTLs might be used instead. Implementations vary on this detail. Since you are dealing with an entire internet's worth of different implementations, the only safe assumption is that some servers are using it.

You can't assume that all of the cached NS record TTLs on the internet are from one source or the other. This forces you to plan around the higher of the two unless you are truly not concerned with recursive DNS servers that you do not operate.

Putting all of this together, we arrive at the following conclusions:

  • The maximum amount of time needed for any given DNS record to refresh against a new nameserver is the highest TTL between that record, the NS records in the glue, and the NS records in the zone.
  • Assuming the TTLs on the new DNS server are identical to the old server, the maximum amount of time needed for the rollback is the highest value of the three TTLs, once again. Any queries that landed on the new server between the time you initially changed the DNS servers and reverted the change will be relying on the values obtained from the new server.
  • It's very important to keep all of the DNS servers involved in the change running and synchronized until all of the TTLs have expired following your final NS record change. Not only do you need all of the servers available for clients that haven't picked up the latest change, but any inconsistency in record data between the two can serve to make things even more confusing.
Andrew B
  • 31,858
  • 12
  • 90
  • 128
  • Thanks for comprehensive reply. Is it the case that even if I reduce the TTL on the A records and the apex NS records (with both providers), I still need to allow for the numbers of seconds in the TTL of the NS records advertised by the root servers (which I obviously can't control)? In which case, there isn't a lot of point in changing the TTL on the A records of the apex NS records... – Garreth McDaid Sep 07 '15 at 19:49
  • The root servers don't matter so much here since they only delegate to the TLD nameservers. I'll assume you meant the TLDs. Yeah, you'll still need to wait before you turn off the old servers, but there's still value: it gets *most* people over to your new servers more quickly. – Andrew B Sep 07 '15 at 20:22
1

You can do this easily with nslookup on Windows, I'm assuming you could do the same with dig. With nslookup you simply query one of the GTLD name servers for the name server records of your domain using debug to get a list of name servers for your domain with the TTL of those name server records.

Microsoft Windows [Version 10.0.10240]
(c) 2015 Microsoft Corporation. All rights reserved.

C:\Users\Joe>nslookup
Default Server:  Unknown
Address:  192.168.1.2

> server f.gtld-servers.net
Default Server:  f.gtld-servers.net
Address:  192.35.51.30

> set q=ns
> set debug
> crabbygeezer.com
Server:  f.gtld-servers.net
Address:  192.35.51.30

------------
Got answer:
    HEADER:
        opcode = QUERY, id = 4, rcode = NOERROR
        header flags:  response, want recursion
        questions = 1,  answers = 0,  authority records = 5,  additional = 10

    QUESTIONS:
        crabbygeezer.com, type = NS, class = IN
    AUTHORITY RECORDS:
    ->  crabbygeezer.com
        nameserver = freedns1.registrar-servers.com
        ttl = 172800 (2 days)
    ->  crabbygeezer.com
        nameserver = freedns2.registrar-servers.com
        ttl = 172800 (2 days)
    ->  crabbygeezer.com
        nameserver = freedns3.registrar-servers.com
        ttl = 172800 (2 days)
    ->  crabbygeezer.com
        nameserver = freedns4.registrar-servers.com
        ttl = 172800 (2 days)
    ->  crabbygeezer.com
        nameserver = freedns5.registrar-servers.com
        ttl = 172800 (2 days)
    ADDITIONAL RECORDS:
    ->  freedns1.registrar-servers.com
        internet address = 208.64.122.242
        ttl = 172800 (2 days)
    ->  freedns1.registrar-servers.com
        internet address = 72.20.53.50
        ttl = 172800 (2 days)
    ->  freedns2.registrar-servers.com
        internet address = 208.64.122.244
        ttl = 172800 (2 days)
    ->  freedns2.registrar-servers.com
        internet address = 72.20.38.137
        ttl = 172800 (2 days)
    ->  freedns3.registrar-servers.com
        internet address = 5.135.128.216
        ttl = 172800 (2 days)
    ->  freedns3.registrar-servers.com
        internet address = 62.210.149.103
        ttl = 172800 (2 days)
    ->  freedns4.registrar-servers.com
        internet address = 62.210.149.102
        ttl = 172800 (2 days)
    ->  freedns4.registrar-servers.com
        internet address = 72.20.53.50
        ttl = 172800 (2 days)
    ->  freedns5.registrar-servers.com
        internet address = 192.99.40.34
        ttl = 172800 (2 days)
    ->  freedns5.registrar-servers.com
        internet address = 72.20.53.50
        ttl = 172800 (2 days)

------------
crabbygeezer.com
        nameserver = freedns1.registrar-servers.com
        ttl = 172800 (2 days)
crabbygeezer.com
        nameserver = freedns2.registrar-servers.com
        ttl = 172800 (2 days)
crabbygeezer.com
        nameserver = freedns3.registrar-servers.com
        ttl = 172800 (2 days)
crabbygeezer.com
        nameserver = freedns4.registrar-servers.com
        ttl = 172800 (2 days)
crabbygeezer.com
        nameserver = freedns5.registrar-servers.com
        ttl = 172800 (2 days)
freedns1.registrar-servers.com
        internet address = 208.64.122.242
        ttl = 172800 (2 days)
freedns1.registrar-servers.com
        internet address = 72.20.53.50
        ttl = 172800 (2 days)
freedns2.registrar-servers.com
        internet address = 208.64.122.244
        ttl = 172800 (2 days)
freedns2.registrar-servers.com
        internet address = 72.20.38.137
        ttl = 172800 (2 days)
freedns3.registrar-servers.com
        internet address = 5.135.128.216
        ttl = 172800 (2 days)
freedns3.registrar-servers.com
        internet address = 62.210.149.103
        ttl = 172800 (2 days)
freedns4.registrar-servers.com
        internet address = 62.210.149.102
        ttl = 172800 (2 days)
freedns4.registrar-servers.com
        internet address = 72.20.53.50
        ttl = 172800 (2 days)
freedns5.registrar-servers.com
        internet address = 192.99.40.34
        ttl = 172800 (2 days)
freedns5.registrar-servers.com
        internet address = 72.20.53.50
        ttl = 172800 (2 days)
>

The syntax for performing a similar query using dig is:

$ dig NS crabbygeezer.com @f.gtld-servers.net +trace
EEAA
  • 108,414
  • 18
  • 172
  • 242
joeqwerty
  • 108,377
  • 6
  • 80
  • 171
  • Thanks. The TTL on the NS records doesn't change when I continue querying, unlike when I query the NS records against my default DNS server. Its just stuck on 172,800 (2 days). I guess only my DNS server would see that TTL deplete? Also, I guess this means that I'm stuck with a 2 day window for implementation and rollback, and there isn't really anything I can do to hurry that up? – Garreth McDaid Sep 07 '15 at 15:41
  • More info, when I change the NS records at the TLD level, I see that the change propagates (to me at least) within about 2 hours. How does that reconcile with the 172,800 (2 days) TTL that is being reported by the GTLD servers? – Garreth McDaid Sep 07 '15 at 16:07
0

It used to be you could query the SOA record for the domain to get the default TTL values:

dig example.com. SOA

But that's been deprecated in favor of the $TTL directive.

If you have particular records you're interested in, you could add the +ttlid flag to dig:

dig +ttlid somehost.example.com.

To get the exact TTL remaining:

;; ANSWER SECTION:
somehost.example.com.      604800  IN      A       192.168.99.5

(second field is TTL - in this case 604800)

Brandon Xavier
  • 1,942
  • 13
  • 15
  • 2
    But isn't the TTL in this case the TTL in the zone file? This won't give me any indication on the actual amount of time remaining in respect of a change to the NS records at the root server level. – Garreth McDaid Sep 07 '15 at 15:54
  • The root servers are the source of the info and don't need TTLs. They get populated from the domain registrars. You're probably more concerned with all the other servers in the world holding stale copies of the records in cache. – Brandon Xavier Sep 07 '15 at 16:22
  • The `+ttlid` flag only tells `dig` that you want it to print TTL values. `$TTL` is something you put in zone files so that you don't have to manually type a TTL value for every record in it. Neither has anything to do with being able to query servers about TTLs. – Calle Dybedahl Sep 08 '15 at 07:34