kubeadm based kubernetes (v1.24.2) worker nodes are still in "NotReady" status even after installing calico CNI ("dial unix /var/run/bird/bird.ctl: connect: no such file or directory")
I have deployed calico CNI on kubeadm based kubernetes cluster but the worker nodes still have the "NotReady" status value.
TCP/IP port 179 is open on all the nodes. The SELinux reports no denials.
On one of the worker nodes the logs of kubelet service yields the ouput below.
$ journalctl -x -u kubelet.service;
Aug 26 10:32:39 centos7-03-08 kubelet[2063]: I0826 10:32:39.855197 2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.007016 2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572 2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:44 centos7-03-08 kubelet[2063]: E0826 10:32:44.011224 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:47 centos7-03-08 kubelet[2063]: I0826 10:32:47.172929 2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="ready" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:49 centos7-03-08 kubelet[2063]: E0826 10:32:49.013157 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:54 centos7-03-08 kubelet[2063]: E0826 10:32:54.014957 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:59 centos7-03-08 kubelet[2063]: E0826 10:32:59.016829 2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
The kubelet seems to complain about "BIRD" not being ready as shown in the line below.
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572 2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Where does "BIRD" come from and how can this be resolved?
I have one control plane VM and three worker VMs. Each VM has three network interfaces. Two of them are active and have static IP addresses assigned to them, one for each.
All the four nodes (1 control plane node + 3 worker nodes) have identical contents in their /etc/cni/net.d/ directories as well as /var/lib/calico/ directories.
$ ssh somebody@192.168.12.17 "ls -tlr /etc/cni/net.d/ /var/lib/calico/";date;
/etc/cni/net.d/:
total 8
-rw-r--r--. 1 root root 805 Aug 25 20:36 10-calico.conflist
-rw-------. 1 root root 2718 Aug 25 20:37 calico-kubeconfig
/var/lib/calico/:
total 8
-rw-r--r--. 1 root root 13 Aug 25 20:37 nodename
-rw-r--r--. 1 root root 4 Aug 25 20:37 mtu
$
The kubelet service in the control-plane node has the following log output snipped
$ journalctl -x -u kubelet.service -f;
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546857 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546952 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546973 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.547921 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548010 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548030 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549112 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549179 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549198 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549414 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549501 2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
I installed calico CNI using the commands below as per the official documentation at "https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onpremises".
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.0/manifests/tigera-operator.yaml
kubectl create -f /tmp/custom-resources.yaml
The contents of "/tmp/custom-resources.yaml" are shown below.
---
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
ipPools:
-
blockSize: 26
cidr: 172.22.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
---
# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
The config file I supplied to kubeadm init command --config argument contains the following section (this is abbreviated version of the file)
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
dnsDomain: cluster.local
serviceSubnet: 172.21.0.0/16
podSubnet: 172.22.0.0/16
The contents of "/etc/cni/net.d/10-calico.conflist" on the controlplane and worker nodes are identical.
$ cat /etc/cni/net.d/10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"datastore_type": "kubernetes",
"mtu": 0,
"nodename_file_optional": false,
"log_level": "Info",
"log_file_path": "/var/log/calico/cni/cni.log",
"ipam": { "type": "calico-ipam", "assign_ipv4" : "true", "assign_ipv6" : "false"},
"container_settings": {
"allow_ip_forwarding": false
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"k8s_api_root":"https://172.21.0.1:443",
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
},
{"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
]
}
I have deployed a pod on this system but the pod is in pending state as the worker nodes are not in "Ready" state. The output of the command below explains so
$ kubectl describe pod/my-nginx -n ns-test-02;
Yields the output below
{ kube_apiserver_node_01="192.168.12.17"; { kubectl --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf describe pod/my-nginx -n ns-test-02 ; }; };
Name: my-nginx
Namespace: ns-test-02
Priority: 0
Node: <none>
Labels: app=nginx
purpose=learning
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
my-nginx:
Image: nginx
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blxv4 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-blxv4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 16m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Note the events section
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Warning FailedScheduling 16m default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
The pod/my-nginx object was constructed using the command below.
{
kube_apiserver_node_01="192.168.12.17";
kubectl \
--kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
create \
namespace ns-test-02 \
;
}
{
kube_apiserver_node_01="192.168.12.17";
kubectl \
--kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
--namespace=ns-test-02 \
run my-nginx \
--image=nginx \
--restart=Never \
--port=80 \
--expose=true \
--labels='purpose=learning,app=nginx' \
;
}
Below is a listing of the nodes,pods,services objects in the kubeadm based kubernetes cluster.
{ kube_apiserver_node_01="192.168.12.17"; { kubectl --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf get nodes,pods,services -A -o wide ; }; };
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node/centos7-03-05 Ready control-plane 5h33m v1.24.2 192.168.12.17 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
node/centos7-03-08 NotReady <none> 5h33m v1.24.2 192.168.12.20 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
node/centos7-03-09 NotReady <none> 5h32m v1.24.2 192.168.12.21 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
node/centos7-03-10 NotReady <none> 5h32m v1.24.2 192.168.12.22 <none> CentOS Linux 7 (Core) 3.10.0-1160.76.1.el7.x86_64 cri-o://1.24.2
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-apiserver pod/calico-apiserver-658d588b56-bs5j6 1/1 Running 0 4h42m 172.22.147.134 centos7-03-05 <none> <none>
calico-apiserver pod/calico-apiserver-658d588b56-zxhpg 1/1 Running 0 4h42m 172.22.147.133 centos7-03-05 <none> <none>
calico-system pod/calico-kube-controllers-5f44c7d7d7-n7lfd 1/1 Running 2 (4h43m ago) 4h45m 172.22.147.129 centos7-03-05 <none> <none>
calico-system pod/calico-node-bj9f9 1/1 Running 2 (4h42m ago) 4h45m 192.168.12.22 centos7-03-10 <none> <none>
calico-system pod/calico-node-brpjc 1/1 Running 0 4h45m 192.168.12.20 centos7-03-08 <none> <none>
calico-system pod/calico-node-ksqqn 1/1 Running 0 4h45m 192.168.12.17 centos7-03-05 <none> <none>
calico-system pod/calico-node-vpjx7 1/1 Running 3 (4h42m ago) 4h45m 192.168.12.21 centos7-03-09 <none> <none>
calico-system pod/calico-typha-77c99dcb74-76rt4 1/1 Running 0 4h45m 192.168.12.22 centos7-03-10 <none> <none>
calico-system pod/calico-typha-77c99dcb74-qs5x8 1/1 Running 0 4h45m 192.168.12.21 centos7-03-09 <none> <none>
calico-system pod/csi-node-driver-gdr4r 2/2 Running 0 4h44m 172.22.147.131 centos7-03-05 <none> <none>
kube-system pod/coredns-6d4b75cb6d-h4kxp 1/1 Running 0 5h33m 172.22.147.130 centos7-03-05 <none> <none>
kube-system pod/coredns-6d4b75cb6d-n9f9h 1/1 Running 0 5h33m 172.22.147.132 centos7-03-05 <none> <none>
kube-system pod/kube-apiserver-centos7-03-05 1/1 Running 0 5h33m 192.168.12.17 centos7-03-05 <none> <none>
kube-system pod/kube-controller-manager-centos7-03-05 1/1 Running 1 (4h43m ago) 5h33m 192.168.12.17 centos7-03-05 <none> <none>
kube-system pod/kube-proxy-5qfsl 1/1 Running 0 5h32m 192.168.12.22 centos7-03-10 <none> <none>
kube-system pod/kube-proxy-r62r4 1/1 Running 0 5h33m 192.168.12.17 centos7-03-05 <none> <none>
kube-system pod/kube-proxy-t7lnr 1/1 Running 0 5h32m 192.168.12.21 centos7-03-09 <none> <none>
kube-system pod/kube-proxy-v4wjs 1/1 Running 0 5h33m 192.168.12.20 centos7-03-08 <none> <none>
kube-system pod/kube-scheduler-centos7-03-05 1/1 Running 1 (4h43m ago) 5h33m 192.168.12.17 centos7-03-05 <none> <none>
ns-test-02 pod/my-nginx 0/1 Pending 0 36s <none> <none> <none> <none>
tigera-operator pod/tigera-operator-7ff575f7f7-6qhft 1/1 Running 1 (4h43m ago) 4h45m 192.168.12.20 centos7-03-08 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
calico-apiserver service/calico-api ClusterIP 172.21.182.168 <none> 443/TCP 4h42m apiserver=true
calico-system service/calico-kube-controllers-metrics ClusterIP 172.21.46.154 <none> 9094/TCP 4h42m k8s-app=calico-kube-controllers
calico-system service/calico-typha ClusterIP 172.21.208.66 <none> 5473/TCP 4h45m k8s-app=calico-typha
default service/kubernetes ClusterIP 172.21.0.1 <none> 443/TCP 5h33m <none>
kube-system service/kube-dns ClusterIP 172.21.0.10 <none> 53/UDP,53/TCP,9153/TCP 5h33m k8s-app=kube-dns
ns-test-02 service/my-nginx ClusterIP 172.21.208.139 <none> 80/TCP 36s app=nginx,purpose=learning