0

I am noticing an unusual high latency when trying to resolve local and external DNS lookups inside Kubernetes. nslookup requests are sometimes resolved but take on average 40 seconds to complete.

These lookups are causing systems like Kafka and Mariadb to crash: Timed out waiting for connection while in state: CONNECTING

I have included below a successful and unsuccessful nslookup of kubernetes.default

$ kubectl exec -ti busybox -- nslookup kubernetes.default
0.1851s   Server:    10.39.240.10
0.0004s   Address 1: 10.39.240.10 kube-dns.kube-system.svc.cluster.localbe-dns.kube-system.svc.cluster.local
35.2081s  command terminated with exit code 1
0.0075s   nslookup: can't resolve 'kubernetes.default'
0.0004s   

Total   35.4025s
$ kubectl exec -ti busybox -- nslookup kubernetes.default
0.1362s    Server:    10.39.240.10
0.1370s    Address 1: 10.39.240.10 kube-dns.kube-system.svc.cluster.localbe-dns.kube-system.svc.cluster.local
25.1734s                                                                              
40.2251s   Name:      kubernetes.defaultnetes.default
40.2288s   Address 1: 10.39.240.1 kubernetes.default.svc.cluster.local240.1 kubernetes.default.svc.cluster.local
40.2291s   

Total   40.2292s

I have tried adjusting the KubeDNS configuration map, rebuild/add new nodes to the cluster, scaling up the amount of available kube-dns instances all without any success.

Only when inspecting the kube-dns logs did I find something interesting. The kubedns container inside kube-dns threw the following warning:

Could not find endpoints for service "mariadb" in namespace "mysql". DNS records will be created once endpoints show up.

$ kubectl logs -n kube-system kube-dns-b46cc9485-qjm25 -c kubedns
I0603 22:23:48.889404       1 dns.go:48] version: 1.14.13
I0603 22:23:48.890627       1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s
I0603 22:23:48.890762       1 server.go:121] FLAG: --alsologtostderr="false"
I0603 22:23:48.890774       1 server.go:121] FLAG: --config-dir="/kube-dns-config"
I0603 22:23:48.890794       1 server.go:121] FLAG: --config-map=""
I0603 22:23:48.890798       1 server.go:121] FLAG: --config-map-namespace="kube-system"
I0603 22:23:48.890802       1 server.go:121] FLAG: --config-period="10s"
I0603 22:23:48.890809       1 server.go:121] FLAG: --dns-bind-address="0.0.0.0"
I0603 22:23:48.890820       1 server.go:121] FLAG: --dns-port="10053"
I0603 22:23:48.890827       1 server.go:121] FLAG: --domain="cluster.local."
I0603 22:23:48.890839       1 server.go:121] FLAG: --federations=""
I0603 22:23:48.890851       1 server.go:121] FLAG: --healthz-port="8081"
I0603 22:23:48.890856       1 server.go:121] FLAG: --initial-sync-timeout="1m0s"
I0603 22:23:48.890865       1 server.go:121] FLAG: --kube-master-url=""
I0603 22:23:48.890875       1 server.go:121] FLAG: --kubecfg-file=""
I0603 22:23:48.890880       1 server.go:121] FLAG: --log-backtrace-at=":0"
I0603 22:23:48.890893       1 server.go:121] FLAG: --log-dir=""
I0603 22:23:48.890898       1 server.go:121] FLAG: --log-flush-frequency="5s"
I0603 22:23:48.890910       1 server.go:121] FLAG: --logtostderr="true"
I0603 22:23:48.890914       1 server.go:121] FLAG: --nameservers=""
I0603 22:23:48.890918       1 server.go:121] FLAG: --stderrthreshold="2"
I0603 22:23:48.890922       1 server.go:121] FLAG: --v="2"
I0603 22:23:48.890932       1 server.go:121] FLAG: --version="false"
I0603 22:23:48.890944       1 server.go:121] FLAG: --vmodule=""
I0603 22:23:48.891109       1 server.go:169] Starting SkyDNS server (0.0.0.0:10053)
I0603 22:23:48.891471       1 server.go:179] Skydns metrics enabled (/metrics:10055)
I0603 22:23:48.891514       1 dns.go:188] Starting endpointsController
I0603 22:23:48.891523       1 dns.go:191] Starting serviceController
I0603 22:23:48.891765       1 sync.go:167] Updated stubDomains to map[acme.local:[1.2.3.4]]
I0603 22:23:48.891794       1 sync.go:177] Updated upstreamNameservers to [8.8.8.8 8.8.4.4]
I0603 22:23:48.891836       1 dns.go:184] Configuration updated: {TypeMeta:{Kind: APIVersion:} Federations:map[] StubDomains:map[acme.local:[1.2.3.4]] UpstreamNameservers:[8.8.8.8 8.8.4.4]}
I0603 22:23:48.892030       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0603 22:23:48.892059       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0603 22:23:48.905592       1 dns.go:601] Could not find endpoints for service "broker" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:48.905640       1 dns.go:601] Could not find endpoints for service "prometheus-alertmanager-operated" in namespace "metrics". DNS records will be created once endpoints show up.
I0603 22:23:48.905825       1 dns.go:601] Could not find endpoints for service "pzoo" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:48.905981       1 dns.go:601] Could not find endpoints for service "mariadb" in namespace "mysql". DNS records will be created once endpoints show up.
I0603 22:23:48.906386       1 dns.go:601] Could not find endpoints for service "zoo" in namespace "kafka". DNS records will be created once endpoints show up.
I0603 22:23:49.392152       1 dns.go:222] Initialized services and endpoints from apiserver
I0603 22:23:49.392298       1 server.go:137] Setting up Healthz Handler (/readiness)
I0603 22:23:49.392319       1 server.go:142] Setting up cache handler (/cache)
I0603 22:23:49.392335       1 server.go:128] Status HTTP port 8081

I also noticed when enabling logging inside dnsmasq that queries are preformed but no reply is given. I have included a small snapshot of the log:

I0604 07:03:45.386985       1 nanny.go:116] dnsmasq[11]: forwarded prometheus-kube-state-metrics.metrics.svc.svc.cluster.local to 127.0.0.1
I0604 07:03:45.419486       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.weave.svc.cluster.local from 10.36.20.1
I0604 07:03:45.419666       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.weave.svc.cluster.local to 127.0.0.1
I0604 07:03:45.421149       1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local.svc.cluster.local from 10.36.20.1
I0604 07:03:45.421266       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.svc.cluster.local to 127.0.0.1
I0604 07:03:45.421333       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.svc.cluster.local from 10.36.20.1
I0604 07:03:45.421437       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.svc.cluster.local to 127.0.0.1
I0604 07:03:45.423253       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.cluster.local from 10.36.20.1
I0604 07:03:45.423375       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.cluster.local to 127.0.0.1
I0604 07:03:45.424088       1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local.europe-west4-a.c.suzuka.internal from 10.36.20.1
I0604 07:03:45.424270       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.europe-west4-a.c.suzuka.internal to 169.254.169.254
I0604 07:03:45.425873       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.c.suzuka.internal from 10.36.20.1
I0604 07:03:45.426047       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.c.suzuka.internal to 169.254.169.254
I0604 07:03:45.427186       1 nanny.go:116] dnsmasq[11]: query[A] weave-scope-app.weave.svc.cluster.local.google.internal from 10.36.20.1
I0604 07:03:45.427356       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local.google.internal to 169.254.169.254
I0604 07:03:45.432498       1 nanny.go:116] dnsmasq[11]: query[AAAA] weave-scope-app.weave.svc.cluster.local from 10.36.20.1
I0604 07:03:45.432665       1 nanny.go:116] dnsmasq[11]: forwarded weave-scope-app.weave.svc.cluster.local to 127.0.0.1
I0604 07:03:45.679157       1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:45.679353       1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:46.544879       1 nanny.go:116] dnsmasq[11]: query[PTR] 82.119.177.108.in-addr.arpa from 10.36.20.1
I0604 07:03:46.544929       1 nanny.go:116] dnsmasq[11]: forwarded 82.119.177.108.in-addr.arpa to 127.0.0.1
I0604 07:03:47.613025       1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:47.613050       1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:47.613191       1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:47.613245       1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:47.613477       1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:47.613547       1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:47.613823       1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:47.613879       1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:47.614112       1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:47.614180       1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:49.544862       1 nanny.go:116] dnsmasq[11]: query[PTR] 25.121.18.104.in-addr.arpa from 10.36.20.1
I0604 07:03:49.545061       1 nanny.go:116] dnsmasq[11]: forwarded 25.121.18.104.in-addr.arpa to 127.0.0.1
I0604 07:03:50.679458       1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:50.679606       1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:52.614897       1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:52.614975       1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:52.615283       1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:52.615332       1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:52.615569       1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:52.615615       1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:52.615870       1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:52.615915       1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:52.616158       1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:52.616213       1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:55.281882       1 nanny.go:116] dnsmasq[11]: query[A] prometheus-kube-state-metrics.metrics.svc.metrics.svc.cluster.local from 10.36.20.8
I0604 07:03:55.282681       1 nanny.go:116] dnsmasq[11]: forwarded prometheus-kube-state-metrics.metrics.svc.metrics.svc.cluster.local to 127.0.0.1
I0604 07:03:55.679869       1 nanny.go:116] dnsmasq[11]: query[SRV] kubernetes.default.svc.cluster.local from 127.0.0.1
I0604 07:03:55.679903       1 nanny.go:116] dnsmasq[11]: forwarded kubernetes.default.svc.cluster.local to 127.0.0.1
I0604 07:03:57.616880       1 nanny.go:116] dnsmasq[11]: query[TXT] hits.bind from 127.0.0.1
I0604 07:03:57.616973       1 nanny.go:116] dnsmasq[11]: config hits.bind is <TXT>
I0604 07:03:57.617280       1 nanny.go:116] dnsmasq[11]: query[TXT] misses.bind from 127.0.0.1
I0604 07:03:57.617354       1 nanny.go:116] dnsmasq[11]: config misses.bind is <TXT>
I0604 07:03:57.617600       1 nanny.go:116] dnsmasq[11]: query[TXT] evictions.bind from 127.0.0.1
I0604 07:03:57.617656       1 nanny.go:116] dnsmasq[11]: config evictions.bind is <TXT>
I0604 07:03:57.617898       1 nanny.go:116] dnsmasq[11]: query[TXT] insertions.bind from 127.0.0.1
I0604 07:03:57.617957       1 nanny.go:116] dnsmasq[11]: config insertions.bind is <TXT>
I0604 07:03:57.618192       1 nanny.go:116] dnsmasq[11]: query[TXT] cachesize.bind from 127.0.0.1
I0604 07:03:57.618251       1 nanny.go:116] dnsmasq[11]: config cachesize.bind is <TXT>
I0604 07:03:57.976779       1 nanny.go:116] dnsmasq[11]: query[PTR] 10.240.39.10.in-addr.arpa from 10.36.20.21
I0604 07:03:57.976823       1 nanny.go:116] dnsmasq[11]: forwarded 10.240.39.10.in-addr.arpa to 127.0.0.1
I0604 07:03:57.977075       1 nanny.go:116] dnsmasq[11]: reply 10.39.240.10 is kube-dns.kube-system.svc.cluster.local

2 Answers2

0

I had similar issue with KubeDNS. Din't find the root cause, but managed to fix by installing CoreDNS instead.

Since Kubernetes 1.11 CoreDNS is used by default as a cluster DNS server, so I highly recommend to migrate to from KubeDNS to CoreDNS.

Consider this doc if you decide to proceed - https://kubernetes.io/docs/tasks/administer-cluster/coredns/#installing-kube-dns-instead-of-coredns-with-kubeadm

A_Suh
  • 324
  • 1
  • 7
0

I managed to resolve the issue by upgrading both the master and node pools to the latest version of Kubernetes. Unfortunately was I not able to pinpoint the exact issue.