Kubernetes coredns not receiving requests

Question

I've set up a kubernetes cluster, single node, debian 11. However, my CoreDNS doesn't seem to resolve anything. I'm noticing this by portainer being unable to load resources.

http: proxy error: dial tcp: lookup kubernetes.default.svc on 10.96.0.10:53: read udp 10.244.0.4:57589->10.96.0.10:53: i/o timeout

Seeing as this is a timeout to my DNS, I checked the service:

root@dmvandenberg:~/kubernetes# kubectl get svc -n kube-system -o wide
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   78m   k8s-app=kube-dns
root@dmvandenberg:~/kubernetes# kubectl get pods --selector=k8s-app=kube-dns -o wide -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE   IP           NODE              NOMINATED NODE   READINESS GATES
coredns-78fcd69978-2b6cq   1/1     Running   0          79m   10.244.0.2   dmvandenberg.nl   <none>           <none>
coredns-78fcd69978-swprh   1/1     Running   0          79m   10.244.0.3   dmvandenberg.nl   <none>           <none>

I've set up my cluster with these files:

cat init.sh init2.sh
kubeadm init --pod-network-cidr=10.244.0.0/16 --ignore-preflight-errors=all
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
kubectl taint nodes --all node-role.kubernetes.io/master-
kubectl create -f localstorage.yml --save-config
kubectl create -f pvportainer.yml --save-config
kubectl patch storageclass local-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubectl apply -n portainer -f https://raw.githubusercontent.com/portainer/k8s/master/deploy/manifests/portainer/portainer.yaml

I have also attempted with kubectl apply -f https://github.com/coreos/flannel/raw/master/Documentation/kube-flannel.yml instead of kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml.

root@dmvandenberg:~/kubernetes# cat localstorage.yml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
root@dmvandenberg:~/kubernetes# cat pvportainer.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: portainer
spec:
  capacity:
    storage: 11Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /dockerdirs/pvportainer
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - dmvandenberg.nl

I've narrowed down the issue to DNS resolution using the following command and output:

root@dmvandenberg:~/kubernetes# kubectl logs --namespace=kube-system -l k8s-app=kube-dns -f & tcpdump -ani cni0 udp port 53
[5] 9505
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on cni0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
21:21:07.629395 IP 10.244.0.4.44224 > 10.244.0.2.53: 3488+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
21:21:07.629667 IP 10.244.0.4.43161 > 10.244.0.2.53: 433+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
21:21:12.630395 IP 10.244.0.4.54508 > 10.244.0.3.53: 61466+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
21:21:12.630453 IP 10.244.0.4.46088 > 10.244.0.2.53: 55999+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel

I would expect to see replies to the DNS queries, but I'm not seeing any. On internet I found something about adding "log" to the corefile of coredns, so I tried that, but am not seeing any loglines appear. This convinces me that the UDP messages, as shown by tcpdump, are not being read/received by coredns.

I went by all the steps in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/, but this didn't get me any further.

I'm getting stuck after this though. How can I continue debugging? What could be wrong?

Edit: I've tried following this guide: https://www.oueta.com/linux/create-a-debian-11-kubernetes-cluster-with-kubeadm/ I'm seeing exactly the same result, on a different interface:

16:56:06.482769 cali6bd455d068f In  IP 172.20.122.129.60650 > 10.96.0.10.53: 31215+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:06.482980 cali6bd455d068f In  IP 172.20.122.129.35119 > 10.96.0.10.53: 8608+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:11.483200 cali6bd455d068f In  IP 172.20.122.129.57079 > 10.96.0.10.53: 61639+ AAAA? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:11.483309 cali6bd455d068f In  IP 172.20.122.129.38249 > 10.96.0.10.53: 14976+ A? kubernetes.default.svc.portainer.svc.cluster.local. (68)
16:56:16.484367 cali6bd455d068f In  IP 172.20.122.129.57768 > 10.96.0.10.53: 55396+ AAAA? kubernetes.default.svc.svc.cluster.local. (58)
16:56:16.484488 cali6bd455d068f In  IP 172.20.122.129.53058 > 10.96.0.10.53: 50700+ A? kubernetes.default.svc.svc.cluster.local. (58)
16:56:21.484644 cali6bd455d068f In  IP 172.20.122.129.58857 > 10.96.0.10.53: 18986+ AAAA? kubernetes.default.svc.svc.cluster.local. (58)
16:56:21.484702 cali6bd455d068f In  IP 172.20.122.129.36861 > 10.96.0.10.53: 44020+ A? kubernetes.default.svc.svc.cluster.local. (58)

Running tcpdump on the entire interface reveals that TCP does seem to work, considering the ACK messages being sent back. What I did notice is that there is no traffic from 10.96.0.10 (the service) on to the pod, but I don't know if that's required?

17:03:29.224602 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 1, win 169, options [nop,nop,TS val 4014670766 ecr 4073454542], length 0
17:03:29.224869 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [P.], seq 1:107, ack 1, win 169, options [nop,nop,TS val 4014670766 ecr 4073454542], length 106
17:03:29.224887 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [.], ack 107, win 167, options [nop,nop,TS val 4073454542 ecr 4014670766], length 0
17:03:29.225273 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [P.], seq 1:818, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670766], length 817
17:03:29.225341 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 818, win 166, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225399 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [.], seq 818:7958, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 7140
17:03:29.225422 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 7958, win 155, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225430 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [.], seq 7958:15098, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 7140
17:03:29.225448 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 15098, win 138, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225457 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [P.], seq 15098:23486, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 8388
17:03:29.225474 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [.], ack 23486, win 119, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.225564 IP 172.20.122.129.9000 > 169.254.167.173.36088: Flags [F.], seq 23486, ack 107, win 167, options [nop,nop,TS val 4073454543 ecr 4014670767], length 0
17:03:29.225609 IP 169.254.167.173.36088 > 172.20.122.129.9000: Flags [R.], seq 107, ack 23486, win 166, options [nop,nop,TS val 4014670767 ecr 4073454543], length 0
17:03:29.524333 IP 172.20.122.129.9000 > 169.254.167.173.9984: Flags [.], ack 3370092883, win 166, options [nop,nop,TS val 4073454842 ecr 1976747960], length 0
17:03:29.524564 IP 169.254.167.173.9984 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976763065 ecr 4073424519], length 0
17:03:34.218598 IP 172.20.122.129.45239 > 10.96.0.10.53: 23854+ AAAA? kubernetes.default.svc. (40)
17:03:34.219065 IP 172.20.122.129.38604 > 10.96.0.10.53: 24098+ A? kubernetes.default.svc. (40)
17:03:34.388311 IP 172.20.122.129.9000 > 169.254.167.173.7394: Flags [.], ack 917, win 166, options [nop,nop,TS val 4073459706 ecr 1976752753], length 0
17:03:34.388402 IP 169.254.167.173.7394 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976767929 ecr 4073444530], length 0
17:03:34.388314 IP 172.20.122.129.9000 > 169.254.167.173.3949: Flags [.], ack 917, win 166, options [nop,nop,TS val 4073459706 ecr 1976752753], length 0
17:03:34.388424 IP 169.254.167.173.3949 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976767929 ecr 4073444530], length 0
17:03:34.388288 IP 172.20.122.129.9000 > 169.254.167.173.26855: Flags [.], ack 917, win 166, options [nop,nop,TS val 4073459706 ecr 1976752752], length 0
17:03:34.388544 IP 169.254.167.173.26855 > 172.20.122.129.9000: Flags [.], ack 1, win 171, options [nop,nop,TS val 1976767929 ecr 4073444529], length 0
17:03:39.216823 IP 169.254.167.173.36182 > 172.20.122.129.9000: Flags [S], seq 2192346809, win 43200, options [mss 1440,sackOK,TS val 4014680758 ecr 0,nop,wscale 8], length 0
17:03:39.216889 IP 172.20.122.129.9000 > 169.254.167.173.36182: Flags [S.], seq 1678785660, ack 2192346810, win 42840, options [mss 1440,sackOK,TS val 4073464535 ecr 4014680758,nop,wscale 8]
, length 0

Is this issue is still persisting? It's your On-Prem environment (Linux/Virtualization software) or Cloud environment? Can you confirm that all deployed pod are working as expected, you are not lacking of resources? — PjoterS, Dec 09 '21 at 11:44
@PjoterS No, I switched to simply using docker swarm. Good enough for now. — Daniël van den Berg, Dec 10 '21 at 12:15
@DaniëlvandenBerg Kindly post your workaround as an answer to help the community members that have the same issue. — Alex G, Apr 01 '22 at 03:58

Kubernetes coredns not receiving requests

0 Answers0