First some generic explanations to cover some confusions you may have.
Domains exist in TLDs. gTLDs are under contract with ICANN. A domain appear in a zonefile. Registries (managers of TLDs) decide if they publish zonefiles or not. Most, especially ccTLDs, will not, considering that it is both private data and that they are responsible for it. However gTLDs are forced to publish them, due to ICANN regulations. You can learn all about that at https://czds.icann.org/
In short you create an account once, and then will be able to download zonefiles.
gTLDs publish daily zonefiles. Hence a domain will appear one day if it started to be registered (and with nameservers) or if it didn't have nameservers and now has, or, as @Anonymous listed in its reply, when it is put on hold, or deleted (before or after expiration), or changed to remove all nameservers.
Some other registries may allow DNS AXFR queries which means you will be able to get back dynamically the full zonefile when requested, but only a dozen or so TLDs do that.
Also some other registries provide "open data" services, through which you can also get zonefiles or equivalent. Some also publish daily on their websites the new names that have been registered, which is not the zonefile but if you get that data day after day at some point you will be close to have a full zonefile. AFNIC, the registry of .FR is in these 2 cases for example.
Now back to your questions:
I've read that a domain may appear in a daily zone file on multiple days through some change to the dns record. Unfortunately, the source didn't explain the circumstances of when it appears in the daily file.
This should be clear now from the above. A domain is published (in a zonefile) once it exists (is registered), has nameservers and is not on hold.
If it ceases to exist, does not have nameservers anymore or is put on hold, then it will disappear from the zonefile.
Also, (correct me if I'm wrong) once you have had an entire zone file, you then use the daily files to keep your local copy up to date. What mechanism can be used to determine when an entry should be deleted?
gTLDs publish the full zonefile, each day. You are free to download it and then process it the way you want, based on your contract signed on CZDS.
Other registries may impose also other conditions.
If domain A is in yesterday zonefile but not in today's one then you know that the domain has been deleted, or its nameserver removed, or it was put on hold. If you do a whois (or RDAP) query you will then see if the domain exists or not, and if it is on hole or not.
As an example... what I have is a large list of keywords. To begin with, I need to search for domains that include or are similar to those keywords. Going forward, I need to be able to perform a smaller search of the keywords over only new domains. The list of keywords can be added to and the new keywords will need to be searched historically and going forward. So, I will need a local database of domains that would only contain domains that actually exist without having to query any nameserver to check for it's existence.
Many services online do this. But basically you download the zonefiles and process them on your end in a way that conforms to the contract signed and technically so that you can use them the way you need. The keyword search and everything else is to be handled by yourself.
I believe that registrars provide daily deltas but I don't know how expired domains are represented.
First registrars can only provide data they have, hence on their domains not on all of them. So I guess you refer there more about registries, and so see above.
Second, domain name expiration is a complicated process and depends on the TLD.
Here are the generic rules:
- when expiration arrives the registry auto-renews the domain; hence it stays published (in zonefiles) if it was before that event
- sponsoring registrar has then some time to decide to renew it or not (in gTLDs, this is 45 days)
- during that time the registrar can decide to put the domain on hold; in which case the domain will cease to be published and hence ceased to be in zonefiles
- if finally the domain is deleted it will start its redemption period; it may not be published anymore there; and after some more time if nothing happens the domain will really be deleted (and hence not published... until eventually someone registers it again).
I might have just found my own answer... http://bestwhois.org/domain_name_data/docs/README_01_document.html#sec12 They have 2 feeds - 1 for newly registered domains and another for dropped domains.
Anyone downloading zonefiles is then easily able to provide differences:
- new between yesterday and today = domains registered or updated with new nameservers or put out of hold
- missing today from yesterday = domains deleted or updated without nameservers or put on hold
By doing a whois query you can see if the domain is still registered and you will see if it is on hold or not. Hence you will be able to discriminate between all the above cases.
There is nothing very complicated to do there, except:
- volume of data: millions of domains
- you rely on whois for part of it, which is typically query limited
- you signed a contract to be able to download zonefiles and this contract will limit what you can do or not with the data.