4

After upgrading to v1.24.0 (after Dockershim removal), I had to install cri-dockerd, then I did the following:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.0.196

I have chosen flannel as Network Plugin:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Until now everything worked as expected, but after enabling scheduling on master node, joining a worker node and deploying my pods and services, I noticed a strange network issue where NodePort and ClusterIP services were not working between nodes (no issues when using one node).

Later I found out that pods are getting IP addresses from docker network (172.17.0.*) and not from --pod-network-cidr=10.244.0.0/16:

masterzulu@master-zulu:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE     IP              NODE          

django-space   django-588cb669d4-46b4w               1/1     Running   0          3m35s   172.17.0.4      master-zulu
django-space   postgres-deployment-b58d5ff94-hs7t4   1/1     Running   0          3m35s   172.17.0.5      master-zulu
kube-system    coredns-6d4b75cb6d-8gw6c              1/1     Running   0          7m9s    172.17.0.2      master-zulu
kube-system    coredns-6d4b75cb6d-nxlq9              1/1     Running   0          7m9s    172.17.0.3      master-zulu

flannel DaemonSet is runnig:

kube-system    kube-flannel-ds-tqgvk                 1/1     Running   0          5m51s   192.168.3.132   master-zulu

and podCIDR is set:

masterzulu@master-zulu:~$ kubectl get no master-zulu -o json | jq '.spec.podCIDR'
"10.244.0.0/24"

I tried adding the --network-plugin=cni flag to kubelet startup config but I'm getting an error since this flag is removed along with dockershim and other flags in v1.24.0 .

here's the status of cri-docker:

● cri-docker.service - CRI Interface for Docker Application Container Engine
     Loaded: loaded (/etc/systemd/system/cri-docker.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-05-25 21:36:57 BST; 5h 34min ago
TriggeredBy: ● cri-docker.socket
       Docs: https://docs.mirantis.com
   Main PID: 1098 (cri-dockerd)
      Tasks: 15
     Memory: 53.4M
     CGroup: /system.slice/cri-docker.service
             └─1098 /usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=

May 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for kube-system/coredns-6d4b75cb6d-nxlq9 through plugin: invalid network status for"
May 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for kube-system/coredns-6d4b75cb6d-nxlq9 through plugin: invalid network status for"
May 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for kube-system/coredns-6d4b75cb6d-8gw6c through plugin: invalid network status for"
May 26 01:53:13 master-zulu cri-dockerd[1098]: time="2022-05-26T01:53:13+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/8ee7640d48c129058259b4b7632a0f6173ad8a9e2d5368cf3c9f29d1ea7db13e/resolv.conf as [nameserver 192.168.3.48 nameserver 192.168.0.1]"
May 26 01:55:30 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:30+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/f378aff3d077030215ef664d72132b189f8412a8d432e5a554cdbfbb37c3ea19/resolv.conf as [nameserver 10.96.0.10 search django-space.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
May 26 01:55:30 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:30+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/django-588cb669d4-46b4w through plugin: invalid network status for"
May 26 01:55:31 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:31+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/django-588cb669d4-46b4w through plugin: invalid network status for"
May 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/9523255b7991855027185cecbc8420bbe1268fcef21c2ddcb4d76851bce7e3a0/resolv.conf as [nameserver 10.96.0.10 search django-space.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"
May 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/postgres-deployment-b58d5ff94-hs7t4 through plugin: invalid network status for"
May 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn't find network status for django-space/postgres-deployment-b58d5ff94-hs7t4 through plugin: invalid network status for"

Does anyone know what should I do to solve this issue?

Update:

cni0 interface is missing on k8s master:

masterzulu@master-zulu:~$ ifconfig -a
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        inet6 fe80::42:e9ff:fec1:dd1b  prefixlen 64  scopeid 0x20<link>
        ether 02:42:e9:c1:dd:1b  txqueuelen 0  (Ethernet)
        RX packets 5140  bytes 418818 (418.8 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5475  bytes 522703 (522.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.196  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::e808:144d:a0dc:60a6  prefixlen 64  scopeid 0x20<link>
        ether 98:40:bb:3e:f2:1c  txqueuelen 1000  (Ethernet)
        RX packets 6332  bytes 515688 (515.6 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6684  bytes 631167 (631.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::494:d8ff:fe1b:4aab  prefixlen 64  scopeid 0x20<link>
        ether 06:94:d8:1b:4a:ab  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 129 overruns 0  carrier 0  collisions 0
TheDHM
  • 101
  • 6

1 Answers1

6

After some investigations, I found that cri-dockerd service was missing some args:

CGroup: /system.slice/cri-docker.service
         └─1098 /usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=

I added them manually to /etc/systemd/system/cri-docker.service:

...
ExecStart=/usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=k8s.gcr.io/pause:3.7
...

Reload service:

sudo systemctl daemon-reload
sudo systemctl restart cri-docker.service

at this point cri-dockerd is configured correctly, but the problem persists, later I noticed that /opt/cni/bin is empty (no container networking plugins) :

masterzulu@master-zulu:~$ sudo /usr/local/bin/cri-dockerd
INFO[0000] Connecting to docker on the Endpoint unix:///var/run/docker.sock
INFO[0000] Start docker client with request timeout 0s
INFO[0000] Hairpin mode is set to none
ERRO[0000] Error validating CNI config list ({
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}
): [failed to find plugin "portmap" in path [/opt/cni/bin]]
INFO[0000] Docker cri networking managed by network plugin kubernetes.io/no-op
...
INFO[0000] Setting cgroupDriver cgroupfs
INFO[0000] Docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:,},}
INFO[0000] Starting the GRPC backend for the Docker CRI interface.
INFO[0000] Start cri-dockerd grpc backend

I think I deleted /opt/cni/bin by mistake, so I added its contents again (get the Latest release):

cd /tmp && mkdir cni-plugins && wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz  && cd cni-plugins &&  tar zxfv ../cni-plugins-linux-amd64-v1.1.1.tgz
sudo cp /tmp/cni-plugins/* /opt/cni/bin/

ls /opt/cni/bin
bandwidth  bridge  dhcp  firewall  flannel  host-device  host-local  ipvlan  loopback  macvlan  portmap  ptp  sbr  static  tuning  vlan  vrf

after restarting cri-docker service, everything start working as expected:

masterzulu@master-zulu:~$ kubectl get pods -Ao wide
NAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE
django-space   django-588cb669d4-4zz7f               1/1     Running   0          11s   10.244.0.4      master-zulu
django-space   postgres-deployment-b58d5ff94-scmrx   1/1     Running   0          12s   10.244.0.5      master-zulu
kube-system    coredns-6d4b75cb6d-rnjlm              1/1     Running   0          73m   10.244.0.2      master-zulu
kube-system    coredns-6d4b75cb6d-s6zl7              1/1     Running   0          73m   10.244.0.3      master-zulu

cni0 is up:

masterzulu@master-zulu:~$ ifconfig -a
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::8c8:84ff:fe78:d999  prefixlen 64  scopeid 0x20<link>
        ether 0a:c8:84:78:d9:99  txqueuelen 1000  (Ethernet)
        RX packets 27714  bytes 5010722 (5.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 26936  bytes 2898949 (2.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

cri-docker status:

masterzulu@master-zulu:~$ sudo systemctl status cri-docker
● cri-docker.service - CRI Interface for Docker Application Container Engine
     Loaded: loaded (/etc/systemd/system/cri-docker.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-05-27 22:39:06 BST; 1h 57min ago
TriggeredBy: ● cri-docker.socket
       Docs: https://docs.mirantis.com
   Main PID: 187399 (cri-dockerd)
      Tasks: 11
     Memory: 17.1M
     CGroup: /system.slice/cri-docker.service
             └─187399 /usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d --po>

May 28 00:36:20 master-zulu cri-dockerd[187399]: time="2022-05-28T00:36:20+01:00" level=info msg="Using CNI configuration file /etc/cni/net.d/10-flannel.conflist"

my conclusion

the absence of --network-plugin=cni in cri-dockerd startup args or any other problem in CNI configs may cause this problem where the cri-docker considers that the CNI is missing and uses the interface docker0 directly so the pods get thier IP from this range 172.17.0.x.

Hope this helps anyone having the same problem.

TheDHM
  • 101
  • 6
  • This helped me a lot! only doing 'ExecStart=/usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=cni' worked for me – Aupr Jul 08 '22 at 02:17