1

I have a database containing a lot of invalid emails.

I want to remove all the emails whose domain does not have mx record.

So after I extracted the domain part I wrote a script to bulk check this for the distinct domains by executing among others dig [domainName] mx +short with a milliseconds-scaled sleep period between each one.

What is an acceptable rate of executing that command for ~5000 domains without being considered a threat to the network by my ISP? Meaning what value should I set the sleep period to for staying at the safe side.

Marinos An
  • 155
  • 8
  • Although it is considered bad form to not have MX records, a MX record is not actually required to be able to receive e-mail - when no MX exists for `example.co.uk` SMTP servers are supposed to fall back and attempt delivery to the host `example.co.uk` (i.e. the `A` record or `AAAA` or `CNAME`) - Arguably the "correct" method to remove invalid e-mail addresses is by using properly processing bounces / non-delivery errors when you send solicited e-mails. – diya Sep 20 '22 at 15:52
  • I consider you a *threat to the network* if you refuse to follow the minimum legal requirements on the processing of personal data. Either you have a legitimate reason to be processing these addresses, then you need not care which ones still have explicit mail exchanges published, because as you actually send messages, unreachable destinations will just be one out of many rejection reasons you need to deal with. Or you do not (or no longer) have a legitimate reason, then you must cease processing (or even storing) them. – anx Sep 20 '22 at 16:35
  • @anx "unreachable destinations will just be one out of many rejection reasons you need to deal with". Not always in application-level and not always in training environments. That said, the invalid emails on a training environment may pollute the undelivered messages queue of the (one) mail server. – Marinos An Sep 20 '22 at 23:36

2 Answers2

1

The public resolvers operated by Google 8.8.8.8 and/or 8.8.4.4 (and IPv6) consider anything less than 1500 queries per second acceptable and not subject to rate limiting.

Source:

https://developers.google.com/speed/public-dns/docs/isp

So feel free to use those.


No guarantees though as to what your ISP or network administrators consider abnormal usage patterns and reasons for suspecting abuse....

diya
  • 83
  • 7
1

5000 queries is really not much, at least not when executed sequentially.

For context, in DNS (server) sizing discussions, we'd usually talk about how many queries per second or maybe queries per month. We might also worry about amplification effects depending on the type of query & type of server.

Here's what an old F5 whitepaper suggests is robust DNS handling in 2018:

With DNS Express, the individual core of each BIG-IP device can answer approximately 125,000 to 200,000 requests per second, scaling up to more than 50 million query RPS, greater than 12 times the capacity of a typical primary DNS server.

(source: https://www.f5.com/services/resources/white-papers/the-f5-intelligent-dns-scale-reference-architecture)

So you can see 5000 queries isn't very much. But it might create extra load if executed in parallel or often, or if your network's DNS server is undersized or overloaded. You don't specify frequency, but this doesn't sound like a task you would run frequently. It still might be polite to space those queries out regardless, but if you're actually calling dig 5000 times in a row from a script (and waiting for an exit value) that builds in a slight delay between each query anyway. Also a longer delay for domains that can't be resolved, and a much longer delay for those whose nameservers are nonresponsive.

But if you were running this script frequently, you'd be repeating the same queries often. In that case it's better to do some local caching to reduce that repetition, but implementation details are outside the scope of this question.

I don't think your ISP will care unless the network has a very Low bandwidth uplink or an extremely limited stateful firewall (ie, maybe check with them if you're at a facility in Antarctica). Consumer ISPs also care more about inbound queries, not outbound (ie, are you running a prohibited server).

tl;dr 5000 MX lookups won't raise any red flags unless you have a very undersized DNS server, tiny uplink, or maybe run them in parallel. It's a small number in the scheme of things.

omar.s
  • 71
  • 3