Domain suddenly not resolving on ISP's default DNS server

3

0

Domain resolves fine on other ISPs and in Google DNS (8.8.8.8) however there's this one ISP that suddenly is having issues on a server where we host our domains.

When I gave them some sample domains, they were able to fix it but the rest of the domains that I didn't mention still wasn't resolving. So I think they only did a "manual" bandaid fix (maybe kinda like manually editing the IPs in your Windows hosts file, not sure, hopefully not).

I want to them fix the issue properly. I'm not sure what's going on their end but it gives me the impression they are clueless about the real cause of the problem since the other domains that I didn't mention are not resolving.

Ideally, what troubleshooting steps should they take on their end to properly fix the issue?

IMB

Posted 2018-02-15T10:53:40.297

Reputation: 4 845

Try comparing dig +trace yourdomain.com and dig @8.8.8.8 +trace yourdomain.com when connected to the problematic ISP. That should give a hint on where did the DNS resolution stop. – Marek Rost – 2018-02-15T12:58:25.130

@MarekRost I gather that's a Linux command, any Windows equivalent? – IMB – 2018-02-15T15:38:52.307

yea it's linux, sorry missed that you're on Windows. In that case I would download and install it from http://www.isc.org/downloads/ (the BIND package). I'm afraid windows doesn't come with analytic tool like that by default.

– Marek Rost – 2018-02-15T21:08:50.407

Got it. So I tried dig +trace yourdomain.com (or any domain) and the response I get is ;; connection timed out; no servers could be reached The other command dig @8.8.8.8 +trace yourdomain.com gives me a lot of results with root servers, etc... which tells me something's really wrong the problematic ISP? What you make of it? – IMB – 2018-02-16T06:57:48.393

Does this problem happen with more computers than just your own? Try to use DNSViz and whatsmydns.net. You may also try to install the DIG dns tool on windows 10 and add the info here. Some explanations here and here.

– harrymc – 2018-02-18T07:26:39.417

@harrymc The problem also happens to other people I know that uses the same ISP. I checked some domains on whatsmydns.net, all green. As for DNSViz, not sure how to use it but the responses tab shows all OK status. I don't think Dig is working on my PC (windows 7), any domain I check says connection timed out; no servers could be reached E.g., dig superuser.com but if I use specific DNS server E.g., dig @8.8.8.8 superuser.com it works fine – IMB – 2018-02-18T08:44:30.237

The DNS servers of your ISP seem to be a mess. Have you changed anything about your DNS, or was the problem there since you hosted your domain with that ISP? Do you have the possibility of switching ISP or using an external consultant? As we don't have any info about the problem and no direct access, all we can recommend here are the tools to use, although analyzing the problem might require some expertise. – harrymc – 2018-02-18T08:59:47.360

@harrymc The issue only started happening earlier this month. It's been working fine for the last 5 years. My host did move server location last month however no IPs were change during the transition. My host said the server move should not cause any issues. It is baffling because on other ISPs all our domains work fine. My last resort would be to change hosts or maybe ask for a new IP. – IMB – 2018-02-18T10:07:41.013

Google for "dns check" and you will find several websites that might find what is the problem - try them one by one. Otherwise, you will need to give us at least the info by dig for analysis. – harrymc – 2018-02-18T14:52:36.487

1Have you gotten on the phone with your ISP and press to talk with the highest level DNS administrator of their operation center and try to get someone with high knowledge of DNS configuration and infrastructure on their side to help troubleshoot this issue with their DNS system? I think a step you take is to be demanding, they are an ISP, they have admins or they can hire a consultant. If it's interrupting your service, pressure them harder and make them do their work. Once you verify it's not on your end, make it their problem to help resolve. Be a little more demanding & assertive perhaps. – Pimp Juice IT – 2018-02-18T15:57:38.463

@PimpJuiceIT I can only go to level 2 (which acts as a middleman between me and the network admins). It's been a excruciatingly long wait between emails. So I was hoping to expedite it by giving them a clue on where to look. – IMB – 2018-02-19T05:32:30.877

See my above comment for websites specializing in analyzing DNS. The first four are link1, link2, link3, link4.

– harrymc – 2018-02-19T07:41:21.563

@harrymc Checked a few domains, all green except for 1 country (India) in link3 – IMB – 2018-02-19T10:12:56.287

1I don't trust India any more than your ISP. It seems to mean that the DNS works, as long as one communicates from outside the domain of your ISP. It seems funny, because normally the same mechanism should be used for all DNS queries. My guess is that the problem is with some cache that your ISP keeps for its own clients or for users of your country. You can check further by going through some VPN that has its server in another country and see if the problem still happens. If you managed to get dig working on Windows or live Linux, post the information if you want more than a guess. – harrymc – 2018-02-19T11:11:34.630

Yeah I am guessing some random ISP admin rolled back to an old DNS cache and disbled DNS updates from there or something like that. Using VPN the domains work. I'll try tinker with dig. – IMB – 2018-02-19T11:15:25.633

@harrymc I managed to get a laptop with Lubuntu installed. Now when I use dig mydomain.com I get connection timed out; no servers could be reached but if I use dig +trace mydomain.com I get results which appears to be normal (at the end of the results I see our nameserver and our server IP). – IMB – 2018-02-19T11:45:34.753

That message occurs when dig cannot get a response from the DNS servers. I think that dig +trace makes an effort to start from the root servers and down so that it works. This enforces my suspicion of a bad DNS cache or a similar mess-up of DNS records by your ISP. It might be some problem with the glue records when the DNS query does not require an authoritative response.

– harrymc – 2018-02-19T13:39:36.387

@harrymc could they simply clear their DNS cache or is it more complicated than that? – IMB – 2018-02-19T14:23:08.930

If it's the glue records then it's more complicated than just a cache. – harrymc – 2018-02-19T14:28:27.940

@harrymc I did a little experiment. I changed the nameserver of a domain then I put our server's IP on it's A record. Waited a few hours for propagation then it worked. I reverted the nameserver back to the original, waited a few hours, cleared my DNS cache, it stopped working again. What do you make of it? – IMB – 2018-02-19T18:28:50.730

1My guess is that the bad record, maybe a glue record, has just stayed there, so when you undid the change it started causing the same problem again. But hey, you may now have a workaround to the problem, even if your clueless ISP can't help. – harrymc – 2018-02-19T20:33:18.787

Answers

1

I base my answer on the following facts :

  • Calling dig mydomain.com gets connection timed out; no servers could be reached, but dig +trace mydomain.com gets the expected result.

  • Changing the name-server of a domain server and setting an A record with its IP worked. Reverting the name-server back to the original stopped working again.

My explanation is that there a wrong glue record pointing to a wrong IP for your name-server. Only DNS queries that require an authoritative response and don't use glue records get the right response.

A DNS query to look up the IP address of example.com would only get the name-server, say ns1.example.com. But then sending a query to ns1.example.com needs its IP address, for which one needs to ask its parent, example.com, and away we go again until the browser decides that no connection is possible. The glue record tells the name-server of example.com the IP of ns1.example.com, so it can return it immediately.

My theory is that a glue record for your domain contains a wrong IP address for your name-server so no contact can be established with it. Only DNS queries that ask for an authoritative response will work, apparently because such queries bypass glue records.

You may now have a workaround to the problem, even if your ISP can't help, by changing the name-server of your domain server and abandoning the one botched by your ISP.

References :

harrymc

Posted 2018-02-15T10:53:40.297

Reputation: 306 093

To clarify, is the glue record to be fixed on the ISP side or is it to be fixed on the domain registrar's DNS Manager (e.g., Godaddy)? – IMB – 2018-02-21T09:00:54.220

It's maintained by the owner of the DNS server, usually your ISP. – harrymc – 2018-02-21T09:47:46.053

Have you advanced in solving the problem ? – harrymc – 2018-02-24T07:07:21.537

I have not heard back from the ISP admins yet but I did mention the possible glue record issue. – IMB – 2018-02-24T07:17:24.167

It's frustrating when the people charged with keeping your data are not professionally apt, but at least you have a workaround. – harrymc – 2018-02-24T08:59:39.903

It is absolutely frustrating. I wish you were the ISP admin. But yeah, i'll give you the bounty for the trouble. If I miraculously get an update from those admins, i'll update this topic (hopefully we're still alive by then). – IMB – 2018-02-25T17:12:39.397