archlinux: kubernetes - coredns does not work properly

Question

I have installed kubernetes v1.23.0 with Arch Linux as distribution. The cluster consists of a master and a node. Booth systems are KVM based VMs.

When a pod wants to make a DNS query, it gets a timeout when the service forwards the requests to a pod instance of coredns which is running on another kubernetes node.

So I suspect that the network provider is not working properly or some settings (kernel modules, sysctl, ect) are not set, because when the request is forwarded to the locally running coredns pod instance the client gets a response. Here my debugging steps:

Before I started debugging, I increased the loglevel of coredns by adding log to the configmap of coredns.

# kubectl get -n kube-system configmaps coredns -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

I deployed as debugging environment my network container with network tools like dig, nslookup and so on to test the different coredns instances.

kubectl apply -f https://raw.githubusercontent.com/volker-raschek/network-tools/master/network-tools.yml

The following pods and service of coredns is available:

kubectl get pod,service -n kube-system -l k8s-app=kube-dns -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP          NODE                  NOMINATED NODE   READINESS GATES
pod/coredns-64897985d-cgxmv   1/1     Running   0          24h   10.85.0.4   archlinux-x86-64-000  <none>           <none>
pod/coredns-64897985d-l9ftl   1/1     Running   0          24h   10.85.0.3   archlinux-x86-64-001  <none>           <none>

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   24h   k8s-app=kube-dns

Execute a shell in my network pod and tries to query the IP adress of google.com via the coredns service. How to recognize the command takes different lengths of time. Unfortunately, I could not reproduce a timeout error via the service:

# kubectl exec  network-tools -- time dig A google.com @10.96.0.10 +short
142.250.185.238
real    0m 5.02s
user    0m 0.02s
sys 0m 0.00s
# kubectl exec  network-tools -- time dig A google.com @10.96.0.10 +short
142.250.185.238
real    0m 0.03s
user    0m 0.01s
sys 0m 0.00s
# kubectl exec  network-tools -- time dig A google.com @10.96.0.10 +short
142.250.185.238
real    0m 10.03s
user    0m 0.01s
sys 0m 0.01s

Now I limit the query to the different coredns pods. Note that the pod coredns-64897985d-cgxmv with the IP 10.85.0.4 is running on a different node.

pod/coredns-64897985d-l9ftl / 10.85.0.3

kubectl exec  network-tools -- time dig A google.com @10.85.0.3 +short
142.251.36.206
real    0m 0.09s
user    0m 0.00s
sys 0m 0.01s

pod/coredns-64897985d-cgxmv / 10.85.0.4

Here is the timeout error when explicitly using the coredns pod of another node.

# kubectl exec  network-tools -- time dig A google.com @10.85.0.4 +short

; <<>> DiG 9.16.20 <<>> A google.com @10.85.0.4 +short
;; global options: +cmd
;; connection timed out; no servers could be reached

Command exited with non-zero status 9
real    0m 15.02s
user    0m 0.02s
sys 0m 0.00s
command terminated with exit code 9

The following logs were written by the coredns pods:

pod/coredns-64897985d-l9ftl / 10.85.0.3

# kubectl logs -n kube-system coredns-64897985d-l9ftl 
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 127.0.0.1:54962 - 9983 "HINFO IN 4683476401105705616.5032820535498752139. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.058383302s
[INFO] 10.85.0.1:24999 - 26748 "A IN google.com. udp 51 false 4096" NOERROR qr,rd,ra 1549 0.070006969s
[INFO] 10.85.0.1:6142 - 9467 "A IN google.com. udp 51 false 4096" NOERROR qr,aa,rd,ra 1549 0.000959536s
[INFO] 10.85.0.1:2544 - 20425 "A IN google.com. udp 51 false 4096" NOERROR qr,aa,rd,ra 1549 0.00065977s
[INFO] 10.85.0.1:26782 - 372 "A IN google.com. udp 51 false 4096" NOERROR qr,aa,rd,ra 1549 0.001292768s
[INFO] 10.85.0.1:62687 - 27302 "A IN google.com. udp 51 false 4096" 
...

pod/coredns-64897985d-cgxmv / 10.85.0.4

# kubectl logs -n kube-system coredns-64897985d-cgxmv
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete

To narrow down the problem, I reinstalled my cluster via ansible and installed calico instead of flannel via the command below. The same problem exists there.

$ helm install calico projectcalico/tigera-operator --version v3.21.3

I used the installation guide of kubeadm to initialize the cluster. I executed the following kubeadmin init command to initialize the cluster:

$ kubeadm init \
  --apiserver-advertise-address=192.168.179.101 \
  --apiserver-cert-extra-sans=api.example.com \
  --control-plane-endpoint=192.168.179.100 \
  --cri-socket=unix:///var/run/crio/crio.sock \
  --pod-network-cidr=10.244.0.0/16 \
  --upload-certs

The kernel module br_netfilter and the sysctl properties are defined, but the problem still exists. I am at the end of my solution approaches and need advice from experts here. Below is a list of my kernel modules, sysctl settings and other information.

I hope someone can help me.

kernel information

$ uname -a
Linux archlinux-x86-64-000 5.10.90-1-lts #1 SMP Wed, 05 Jan 2022 13:07:40 +0000 x86_64 GNU/Linux

kernel modules

$ lsmod | sort
ac97_bus               16384  1 snd_soc_core
aesni_intel           372736  0
agpgart                53248  4 intel_agp,intel_gtt,ttm,drm
atkbd                  36864  0
bpf_preload            16384  0
bridge                274432  1 br_netfilter
br_netfilter           32768  0
cec                    61440  1 drm_kms_helper
cfg80211              983040  0
crc16                  16384  1 ext4
crc32c_generic         16384  0
crc32c_intel           24576  3
crc32_pclmul           16384  0
crct10dif_pclmul       16384  1
cryptd                 24576  2 crypto_simd,ghash_clmulni_intel
crypto_simd            16384  1 aesni_intel
drm                   577536  5 drm_kms_helper,qxl,drm_ttm_helper,ttm
drm_kms_helper        278528  3 qxl
drm_ttm_helper         16384  1 qxl
ext4                  933888  1
failover               16384  1 net_failover
fat                    86016  1 vfat
fb_sys_fops            16384  1 drm_kms_helper
fuse                  167936  1
ghash_clmulni_intel    16384  0
glue_helper            16384  1 aesni_intel
i2c_i801               36864  0
i2c_smbus              20480  1 i2c_i801
i8042                  36864  0
intel_agp              24576  0
intel_gtt              24576  1 intel_agp
intel_pmc_bxt          16384  1 iTCO_wdt
intel_rapl_common      28672  1 intel_rapl_msr
intel_rapl_msr         20480  0
ip6_udp_tunnel         16384  1 vxlan
ip_set                 57344  0
ip_tables              32768  0
ipt_REJECT             16384  0
ip_vs                 184320  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
ip_vs_rr               16384  0
ip_vs_sh               16384  0
ip_vs_wrr              16384  0
irqbypass              16384  1 kvm
iTCO_vendor_support    16384  1 iTCO_wdt
iTCO_wdt               16384  0
jbd2                  151552  1 ext4
joydev                 28672  0
kvm                   933888  1 kvm_intel
kvm_intel             331776  0
ledtrig_audio          16384  1 snd_hda_codec_generic
libcrc32c              16384  4 nf_conntrack,nf_nat,nf_tables,ip_vs
libps2                 20480  2 atkbd,psmouse
llc                    16384  2 bridge,stp
lpc_ich                28672  0
mac_hid                16384  0
mbcache                16384  1 ext4
Module                  Size  Used by
mousedev               24576  0
net_failover           24576  1 virtio_net
nf_conntrack          172032  6 xt_conntrack,nf_nat,xt_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs
nf_conntrack_netlink    57344  0
nf_defrag_ipv4         16384  1 nf_conntrack
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
nf_nat                 57344  3 xt_nat,nft_chain_nat,xt_MASQUERADE
nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables,ip_set
nf_reject_ipv4         16384  1 ipt_REJECT
nf_tables             249856  183 nft_compat,nft_counter,nft_chain_nat
nft_chain_nat          16384  7
nft_compat             20480  122
nft_counter            16384  84
nls_iso8859_1          16384  0
overlay               147456  18
pcspkr                 16384  0
psmouse               184320  0
qemu_fw_cfg            20480  0
qxl                    73728  0
rapl                   16384  0
rfkill                 28672  2 cfg80211
rng_core               16384  1 virtio_rng
serio                  28672  6 serio_raw,atkbd,psmouse,i8042
serio_raw              20480  0
snd                   114688  8 snd_hda_codec_generic,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_timer,snd_compress,snd_soc_core,snd_pcm
snd_compress           32768  1 snd_soc_core
snd_hda_codec         172032  2 snd_hda_codec_generic,snd_hda_intel
snd_hda_codec_generic    98304  1
snd_hda_core          110592  3 snd_hda_codec_generic,snd_hda_intel,snd_hda_codec
snd_hda_intel          57344  0
snd_hwdep              16384  1 snd_hda_codec
snd_intel_dspcfg       28672  1 snd_hda_intel
snd_pcm               147456  7 snd_hda_intel,snd_hda_codec,soundwire_intel,snd_compress,snd_soc_core,snd_hda_core,snd_pcm_dmaengine
snd_pcm_dmaengine      16384  1 snd_soc_core
snd_soc_core          327680  1 soundwire_intel
snd_timer              49152  1 snd_pcm
soundcore              16384  1 snd
soundwire_bus          90112  3 soundwire_intel,soundwire_generic_allocation,soundwire_cadence
soundwire_cadence      36864  1 soundwire_intel
soundwire_generic_allocation    16384  1 soundwire_intel
soundwire_intel        45056  1 snd_intel_dspcfg
stp                    16384  1 bridge
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
ttm                   114688  2 qxl,drm_ttm_helper
udp_tunnel             20480  1 vxlan
usbhid                 65536  0
veth                   32768  0
vfat                   24576  0
virtio_balloon         24576  0
virtio_blk             20480  2
virtio_console         40960  0
virtio_net             61440  0
virtio_pci             28672  0
virtio_rng             16384  0
vxlan                  77824  0
xhci_pci               20480  0
xhci_pci_renesas       20480  1 xhci_pci
x_tables               53248  11 xt_conntrack,xt_statistic,nft_compat,xt_tcpudp,xt_addrtype,xt_nat,xt_comment,ipt_REJECT,ip_tables,xt_MASQUERADE,xt_mark
xt_addrtype            16384  2
xt_comment             16384  64
xt_conntrack           16384  13
xt_mark                16384  12
xt_MASQUERADE          20480  6
xt_nat                 16384  7
xt_statistic           16384  3
xt_tcpudp              20480  15

sysctl

sysctl properties on pastbin. Exceeds the maximum number of characters for serverfault.

Could you please clarify this ->"When a pod wants to make a DNS query to coredns, it gets a timeout when the service forwards the requests to an externally running pod instance of coredns." How do you making a DNS query to CoreDNS? How do you know it gets timeout? Could you check logs of CoreDNS pods (`kubectl logs --namespace=kube-system -l k8s-app=kube-dns`) ? — Mikolaj S., Jan 12 '22 at 13:39
@MikolajS. i changed the description of the problem and append some more information — Volker Raschek, Jan 12 '22 at 21:58
Could you please share flags that you used for `kubeadm init` command? Especially, did you use flag `--pod-network-cidr`? How did you install Calico CNI - there are few possible solutions, which one did you use? Did you specify a proper [pod CIDR for Calico as well](https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-on-nodes)? — Mikolaj S., Jan 13 '22 at 16:08
Hi @MikolajS., I changed the description again and I added the `kubeadm init` command. Yes, I used the `--pod-network-cidr` flag and installed calico via helm. No changes were made to values.yaml and the environment variable `CALICO_IPV4POOL_CIDR` is not defined. — Volker Raschek, Jan 13 '22 at 19:44
In order to install kubernetes, the `iptables` package had to be [replaced](https://github.com/archlinux/svntogit-community/blob/packages/kubernetes/trunk/PKGBUILD#L14) by `iptables-nft`. The change did not perhaps cause the restrictions? I found the following post about disabling the iptables module. https://mihail-milev.medium.com/no-pod-to-pod-communication-on-centos-8-kubernetes-with-calico-56d694d2a6f4 — Volker Raschek, Jan 13 '22 at 19:50
Thanks for all the information, I will try to replicate your issue and check what can be done. — Mikolaj S., Jan 18 '22 at 18:00
Did you check with both `iptables` installed (I mean the legacy one and nft version)? Could you run `iptables -v` command? — Mikolaj S., Jan 19 '22 at 15:41
Hi @MikolajS., I found the solution why calico and the other cni plugins does not work. Here is the discussion about the problem of the underlying crio runtime https://github.com/cri-o/cri-o/issues/2885#issuecomment-1015732983 — Volker Raschek, Jan 19 '22 at 21:27

score 2 · Accepted Answer · answered Jan 21 '22 at 09:12

Posting community wiki answer based on GitHub topic - Files included in CentOS RPM interfere with CNI operation and ArchLinux wiki page - CRIO-O. Feel free to expand it.

The issue is known and described in ArchLinux wiki:

Warning: Arch installs the plugins provided by cni-plugins to both /usr/lib/cni and /opt/cni/bin but most other plugins (e.g. in-cluster deployments, kubelet managed plugins, etc) by default only install to the second directory.

CRI-O is only configured to look for plugins in the first directory and as a result, any plugins in the second directory are unavailable without some configuration changes.

The workaround is presented in this post by the user bart0sh:

I'm using the following workaround:

install cri-o

delete everything from /etc/cni/net.d

setup node with kubeadm

install cni plugin (wave, flannel, calico, etc)

and additional configuration presented by the user volker-raschek:

Besides the CNI configurations under /etc/cni/net.d, the extension of the plugin_dirs property was missing in the crio.conf configuration, because crio doesn't seem to look in /opt/cni for the rolled out plugins by default. I created a drop-in file.
$ cat /etc/crio/crio.conf.d/00-plugin-dir.conf
[crio.network]
plugin_dirs = [
 "/opt/cni/bin/",
 "/usr/lib/cni",
]

archlinux: kubernetes - coredns does not work properly

kernel information

kernel modules

sysctl

1 Answers1