-1

Sorry if this is a silly question or if I'm asking in the wrong place. But I've been learning about DNS, and there's something that's bugging me about top-level domain (TLD) servers especially the .com one. If I'm understanding right, the .com TLD server has a record of every .com site in existence? I checked Wikipedia, and it says there's about 145 million .com sites registered.

Am I understanding correctly that the .com TLD server (really a server cluster) is absolutely massive to store this many records? Or maybe nowadays, though very large, that's really not unreasonable? I do want to state I understand the TLD server merely points to the authoritative name server so it doesn't contain the entire DNS record of every .com site. But I'm still struggling to wrap my head around this. I know I could be mistaken but this doesn't seem right to have so many records stored at one place, so I'm asking if I'm understanding this correctly. Thanks!

geckels1
  • 1
  • 4
  • 2
    One could store 145 million records on an average modern smartphone, and query it (on an individual basis) without significant performance trouble if properly indexed. It's a trivial amount of data, really. – ceejayoz Apr 20 '21 at 16:46

4 Answers4

5

Am I understanding correctly that the .com TLD server (really a server cluster) is absolutely massive to store this many records?

No. You do have a serious delusion about what compromises even a lower range server these days - it would be a lot of amount for your phone, it is not even remarkable for a laptop. DNS records are amazingly small.

Or maybe nowadays, though very large, that's really not unreasonable?

it is not even very large. Hundreds of Megabytes are not large. They where not large 20 years ago.

Your math is seriously off - a modern mid range server would have somewhere between 32 to 64 cores, a couple of HUNDRED gigabytes easily if needed. And those are cost optimized servers, not anything close to high end. This is a lot of space for basically servers hosting a file that is a small part of a gigabyte. Enough to really optimize access with in memory indices etc.

TomTom
  • 50,857
  • 7
  • 52
  • 134
  • I think the "hundreds of megabytes" reference gives an unreasonable idea of scale. The overall point is valid, but hundreds of megabytes wouldn't get you far with the com zone. – Håkan Lindqvist Apr 20 '21 at 15:31
  • Possibly - still well below what is used in a normal desktop (16gb). Tiny by all accounts. – TomTom Apr 20 '21 at 15:34
  • To me that sounds like a very optimistic idea of the size, but I don't have the `com` zone on hand and have no hard numbers for it specifically. – Håkan Lindqvist Apr 20 '21 at 16:59
  • 1
    Well, then let me put it from another side. A mid range server can easily carry 1tb memory these days, more for higher end or more expensive memory. That is 1024gb. Nothing particularly funny as hardware - just a dual EPIC, 2 processors. Good enough? – TomTom Apr 20 '21 at 19:43
5

You are understanding things correctly.

The .COM zonefile is big indeed... but you can even fetch it, it is publicly available, as for any gTLD, through ICANN centralized zone data access service you can find at https://czds.icann.org/home

It will be of course multiple gigabytes long, as it contains strings, with names of (almost) all domain names under .COM, with their authoritative nameservers, and some associated other records (glues, DNSSEC signatures, DS records, etc.).

This file is big but still very small in the "big data" world.

But you have also to understand that this is just an export format. The servers themselves may not handle that data like that. Data can be in a database (and a database with billions of records is not considered really "big" nowadays), or loaded in memory through appropriate structures such as binary trees, tries or DAFSA structures, etc.

Or maybe nowadays, though very large, that's really not unreasonable?

Big authoritative DNS operators have something else to bother them, far more complicated than the volume of zone: the rate of queries and being able to respond to everyone (as the DNS, as a service, is supposed to have 100% uptime). This means ample overprovisioning at all stages.

but this doesn't seem right to have so many records stored at one place

Right in what sense? This is how the DNS works, each node needs to have a list of all delegated nodes below it. .com is historically a big zone, but there is nothing special about it, I mean for computer this label is like any other.

Patrick Mevzek
  • 9,273
  • 7
  • 29
  • 42
  • Great answer, but is "CDZA" supposed to be "CZDS"? – Håkan Lindqvist Apr 20 '21 at 17:22
  • @HåkanLindqvist Probably clearer in the context I guess yes, I can change it. But both works. See for example https://itp.cdn.icann.org/en/files/registry-agreements/com/com-amend-3-pdf-27mar20-en.pdf §1.1 of appendix 3A. CZDA is the "Access" to zone file in a generic sense, where CZDS is the specific "Service" to access them... – Patrick Mevzek Apr 20 '21 at 17:27
  • Right, makes sense! Either name works, but "CDZA" had the letters out of order which created my confusion as it wasn't searchable. I mentioned the name I already knew of without realizing that rearranging the letters created another related name. – Håkan Lindqvist Apr 20 '21 at 17:36
3

I downloaded com.zone from ICANN's centralized zone data service to my desktop. As of April 2021, uncompressed it is a text file 382 million lines long and 22 GB in size. Which makes sense for about 150 million domains with more than one record each.

Very big for a DNS zone of course, but small for a database.

Remember that DNS is a simple protocol and heavily cached. Easy to scale larger both on larger DNS servers and more of them. And com. only has to serve cache misses, once other resolvers learn who owns a domain they won't ask again for a time.


For fun, here is Server Fault. On Stack Overflow infrastructure blogs our sysadmins admit to not trusting one organization with their DNS, which is true.

ns1.serverfault.com.    172800  in      a       198.252.206.80
ns3.serverfault.com.    172800  in      a       69.59.197.60
serverfault.com.        172800  in      ns      ns-1135.awsdns-13.org.
serverfault.com.        172800  in      ns      ns-860.awsdns-43.net.
serverfault.com.        172800  in      ns      ns-cloud-c1.googledomains.com.
serverfault.com.        172800  in      ns      ns-cloud-c2.googledomains.com.
John Mahowald
  • 30,009
  • 1
  • 17
  • 32
2

As I don't have the com zone data readily available (one could theoretically request access via CZDS, though), I won't be looking at the com zone specifically.

I took a quick look at the much more modestly sized se zone (readily available from IIS). It has roughly 1.5M delegated zones, compared to the roughly 150M of com.

As a master zone file with the formatting produced by dig, the se zone is 1.3 GB. Ie, in terms of storage there is really no concern.

Having nsd load that same file as a zone, it uses some 1.8 GB of memory.
Which nameserver software you use will obviously affect this number, but anything that preloads the zone data into memory will likely end up around that kind of usage, with some variation depending on the exact data structures used to represent the records.

How much memory would the com zone use with the same setup? I'm not sure exactly, for one thing there are probably some differences to the data; it's probably not a perfect comparison to treat com as a "much larger se". But the rough idea from the above, of a couple of hundred gigabytes is probably the right idea of scale at least.

Now, is a couple of hundred gigabytes massive these days? For your typical modern server hardware, that is well within what you can spec out without doing anything really extraordinary.

Do you absolutely have to load the entire thing into memory? Not necessarily, considering there is plenty of seldom used data. However, it's certainly an obvious way of ensuring very quick access, and with the size indicated above it should not actually be prohibitively large.

Håkan Lindqvist
  • 33,741
  • 5
  • 65
  • 90
  • 1
    Maybe of interest related to "Do you absolutely have to load the entire thing into memory?": there were (commercial) proposal of some registries to have different domain names resolving at different "speed" (delay of request), so that important ones are resolved faster. It can be just a commercial gimmick... or it may be tied in fact with how the data is stored. It can be imagined, since all domains are certainly not queried with the same volumes, to have different structures and caches in memory with some names "closer" to the resolver and others longer to fetch from datasource. – Patrick Mevzek Apr 20 '21 at 17:34
  • @PatrickMevzek Right, the point I was trying to make is just that it's not so large that loading it into memory is out of the question. – Håkan Lindqvist Apr 20 '21 at 17:38
  • Tried to make that point more clear. Thanks for the feedback @PatrickMevzek – Håkan Lindqvist Apr 20 '21 at 18:30