Debugging DNS Resolution Issue in Kubernetes

Question

I have built a Kubernetes cluster using Kubespray on Ubuntu 18.04 and facing DNS issue so basically containers cannot communicate through their hostnames.

Things that are working:

containers communication through IP addresses
internet is working from the container
able to resolve kubernetes.default

Kubernetes master:

root@k8s-1:~# cat /etc/resolv.conf | grep -v ^\\#
nameserver 127.0.0.53
search home
root@k8s-1:~#

Pod:

root@k8s-1:~# kubectl exec dnsutils cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local home
options ndots:5
root@k8s-1:~#

CoreDNS pods are healthy:

root@k8s-1:~# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns        
NAME                       READY   STATUS    RESTARTS   AGE
coredns-58687784f9-8rmlw   1/1     Running   0          35m
coredns-58687784f9-hp8hp   1/1     Running   0          35m
root@k8s-1:~#

Logs for CoreDNS pods:

root@k8s-1:~# kubectl describe pods --namespace=kube-system -l k8s-app=kube-dns | tail -n 2
  Normal   Started           35m                 kubelet, k8s-2     Started container coredns
  Warning  DNSConfigForming  12s (x33 over 35m)  kubelet, k8s-2     Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 4.2.2.1 4.2.2.2 208.67.220.220

root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-8rmlw
.:53
2020-02-09T22:56:14.390Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2
2020-02-09T22:56:14.391Z [INFO] CoreDNS-1.6.0
2020-02-09T22:56:14.391Z [INFO] linux/amd64, go1.12.7, 0a218d3
CoreDNS-1.6.0
linux/amd64, go1.12.7, 0a218d3
root@k8s-1:~#

root@k8s-1:~# kubectl logs --namespace=kube-system coredns-58687784f9-hp8hp
.:53
2020-02-09T22:56:20.388Z [INFO] plugin/reload: Running configuration MD5 = b9d55fc86b311e1d1a0507440727efd2
2020-02-09T22:56:20.388Z [INFO] CoreDNS-1.6.0
2020-02-09T22:56:20.388Z [INFO] linux/amd64, go1.12.7, 0a218d3
CoreDNS-1.6.0
linux/amd64, go1.12.7, 0a218d3
root@k8s-1:~#

CoreDNS seems exposed:

root@k8s-1:~# kubectl get svc --namespace=kube-system | grep coredns
coredns                ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   37m
root@k8s-1:~#

root@k8s-1:~# kubectl get ep coredns --namespace=kube-system
NAME      ENDPOINTS                                                  AGE
coredns   10.233.64.2:53,10.233.65.3:53,10.233.64.2:53 + 3 more...   37m
root@k8s-1:~#

These are my problematic pods - all cluster affected because of this issue:

root@k8s-1:~# kubectl get pods -o wide -n default
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
busybox                  1/1     Running   0          17m   10.233.66.7   k8s-3   <none>           <none>
dnsutils                 1/1     Running   0          50m   10.233.66.5   k8s-3   <none>           <none>
nginx-86c57db685-p8zhc   1/1     Running   0          43m   10.233.64.3   k8s-1   <none>           <none>
nginx-86c57db685-st7rw   1/1     Running   0          47m   10.233.66.6   k8s-3   <none>           <none>
root@k8s-1:~#

Able to reach internet using DNS and container through IP address:

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping 10.233.64.3"
PING 10.233.64.3 (10.233.64.3) 56(84) bytes of data.
64 bytes from 10.233.64.3: icmp_seq=1 ttl=62 time=0.481 ms
64 bytes from 10.233.64.3: icmp_seq=2 ttl=62 time=0.551 ms
...

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping google.com"
PING google.com (172.217.21.174) 56(84) bytes of data.
64 bytes from fra07s64-in-f174.1e100.net (172.217.21.174): icmp_seq=1 ttl=61 time=77.9 ms
...

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping kubernetes.default"
PING kubernetes.default.svc.cluster.local (10.233.0.1) 56(84) bytes of data.
64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=1 ttl=64 time=0.030 ms
64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=2 ttl=64 time=0.069 ms
...

Actual issue:

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping nginx-86c57db685-p8zhc"
ping: nginx-86c57db685-p8zhc: Name or service not known
command terminated with exit code 2
root@k8s-1:~#

root@k8s-1:~# kubectl exec -it nginx-86c57db685-st7rw -- sh -c "ping dnsutils"
ping: dnsutils: Name or service not known
command terminated with exit code 2
root@k8s-1:~#

oot@k8s-1:~# kubectl exec -ti busybox -- nslookup nginx-86c57db685-p8zhc
Server:     169.254.25.10
Address:    169.254.25.10:53

** server can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: NXDOMAIN

*** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.home: No answer
*** Can't find nginx-86c57db685-p8zhc.default.svc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.svc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.cluster.local: No answer
*** Can't find nginx-86c57db685-p8zhc.home: No answer

command terminated with exit code 1
root@k8s-1:~#

Am I missing something or how to fix communication between containers using hostnames?

Many thanks

Updated

More checks:

root@k8s-1:~# kubectl exec -ti dnsutils -- nslookup kubernetes.default
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.233.0.1

I have created StatefulSet:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/application/web/web.yaml

An I'm able to ping service "nginx":

root@k8s-1:~/kplay# k exec dnsutils -it nslookup nginx
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   nginx.default.svc.cluster.local
Address: 10.233.66.8
Name:   nginx.default.svc.cluster.local
Address: 10.233.64.3
Name:   nginx.default.svc.cluster.local
Address: 10.233.65.5
Name:   nginx.default.svc.cluster.local
Address: 10.233.66.6

Also able to contact statefulset members when using FQDN

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0.nginx.default.svc.cluster.local
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   web-0.nginx.default.svc.cluster.local
Address: 10.233.65.5

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1.nginx.default.svc.cluster.local
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   web-1.nginx.default.svc.cluster.local
Address: 10.233.66.8

But not using just hostnames:

root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-0
Server:     169.254.25.10
Address:    169.254.25.10#53

** server can't find web-0: NXDOMAIN

command terminated with exit code 1
root@k8s-1:~/kplay# k exec dnsutils -it nslookup web-1
Server:     169.254.25.10
Address:    169.254.25.10#53

** server can't find web-1: NXDOMAIN

command terminated with exit code 1
root@k8s-1:~/kplay#

All of them are living in the same namespace:

root@k8s-1:~/kplay# k get pods -n default
NAME                     READY   STATUS    RESTARTS   AGE
busybox                  1/1     Running   22         22h
dnsutils                 1/1     Running   22         22h
nginx-86c57db685-p8zhc   1/1     Running   0          22h
nginx-86c57db685-st7rw   1/1     Running   0          22h
web-0                    1/1     Running   0          11m
web-1                    1/1     Running   0          10m

Another test which confirms that I'm able to ping services:

kubectl create deployment --image nginx some-nginx
kubectl scale deployment --replicas 2 some-nginx
kubectl expose deployment some-nginx --port=12345 --type=NodePort

root@k8s-1:~/kplay# k exec dnsutils -it nslookup some-nginx
Server:     169.254.25.10
Address:    169.254.25.10#53

Name:   some-nginx.default.svc.cluster.local
Address: 10.233.63.137

Final thoughts

Funny fact, but maybe this is how Kubernetes should work? I'm able to reach service hostname and statefulset members if wanted to reach some pod individually. Reaching individual pod if it's not statefulset seems not very important at least in my k8s usage (could be for everyone).

Are you running this on bare metal or on a cloud provider? If you're running it on a cloud provider, please specify. — Mark Watney, Feb 10 '20 at 09:31
@mWatney thanks for asking. This is a bare metal. To be specific: VMs created on my Macbook using Vagrant. — laimison, Feb 10 '20 at 10:28
Please, refer to [this](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/) document and let me know the results. — Mark Watney, Feb 10 '20 at 10:51
Followed that one and almost everything is tested. Will double check if something else is missing. — laimison, Feb 10 '20 at 11:00
What is the result for `kubectl exec -ti dnsutils -- nslookup kubernetes.default`? — Mark Watney, Feb 10 '20 at 12:35
Will do an additional investigation when will be back, but I'm sure it will work, because I've tried the same from nginx container and the output was: `64 bytes from kubernetes.default.svc.cluster.local (10.233.0.1): icmp_seq=1 ttl=64 time=0.030 ms` — laimison, Feb 10 '20 at 12:38
@mWatney I have updated my question. Funny fact, but am I right that Kubernetes is healthy and there is no issue to solve here? — laimison, Feb 10 '20 at 22:11

score 2 · Accepted Answer · answered Feb 11 '20 at 09:00

I've suggested you to follow this so we could isolate possible problems in your CoreDNS and as you can see it's working fine.

Reaching individual pod if it's not statefulset seems not very important at least in my k8s usage (could be for everyone).

It's possible to reach a pod using a DNS record but as you stated it's not very important on regular K8s implementations.

When enabled, pods are assigned a DNS A record in the form of pod-ip-address.my-namespace.pod.cluster.local.

For example, a pod with IP 1.2.3.4 in the namespace default with a DNS name of cluster.local would have an entry: 1-2-3-4.default.pod.cluster.local. Source

EXAMPLE

$ kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE     IP          NODE                                 NOMINATED NODE   READINESS GATES
dnsutils     1/1     Running   20         20h     10.28.2.3   gke-lab-default-pool-87c6b085-wcp8   <none>           <none>
sample-pod   1/1     Running   0          2m11s   10.28.2.4   gke-lab-default-pool-87c6b085-wcp8   <none>           <none>

$ kubectl exec -ti dnsutils -- nslookup 10-28-2-4.default.pod.cluster.local
Server:     10.31.240.10
Address:    10.31.240.10#53

Name:   10-28-2-4.default.pod.cluster.local
Address: 10.28.2.4

Funny fact, but maybe this is how Kubernetes should work?

Yes, your CoreDNS is working as intended and everything you described is expected.

Thanks @mWatney for your answer. That is an interesting story on how I spent 2 days on an issue that doesn't exist. Welcome to Kubernetes world :) — laimison, Feb 11 '20 at 11:59
Glad to be able to confirm your assumptions. Welcome to Kubernetes world! — Mark Watney, Feb 11 '20 at 12:21

Debugging DNS Resolution Issue in Kubernetes

1 Answers1