kubeadm based kubernetes (v1.24.2) worker nodes are still in "NotReady" status even after installing calico CNI ("Error querying BIRD")

Question

kubeadm based kubernetes (v1.24.2) worker nodes are still in "NotReady" status even after installing calico CNI ("dial unix /var/run/bird/bird.ctl: connect: no such file or directory")

I have deployed calico CNI on kubeadm based kubernetes cluster but the worker nodes still have the "NotReady" status value.

TCP/IP port 179 is open on all the nodes. The SELinux reports no denials.

On one of the worker nodes the logs of kubelet service yields the ouput below.

$ journalctl -x -u kubelet.service;

Aug 26 10:32:39 centos7-03-08 kubelet[2063]: I0826 10:32:39.855197    2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.007016    2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572    2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:44 centos7-03-08 kubelet[2063]: E0826 10:32:44.011224    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:47 centos7-03-08 kubelet[2063]: I0826 10:32:47.172929    2063 kubelet.go:2182] "SyncLoop (probe)" probe="readiness" status="ready" pod="calico-system/calico-node-brpjc"
Aug 26 10:32:49 centos7-03-08 kubelet[2063]: E0826 10:32:49.013157    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:54 centos7-03-08 kubelet[2063]: E0826 10:32:54.014957    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"
Aug 26 10:32:59 centos7-03-08 kubelet[2063]: E0826 10:32:59.016829    2063 kubelet.go:2349] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/cni/net.d/. Has your network provider started?"

The kubelet seems to complain about "BIRD" not being ready as shown in the line below.

Aug 26 10:32:40 centos7-03-08 kubelet[2063]: >
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: I0826 10:32:40.997572    2063 prober.go:121] "Probe failed" probeType="Readiness" pod="calico-system/calico-node-brpjc" podUID=d1206cc9-f573-42c0-a43b-d1d5f3dae106 containerName="calico-node" probeResult=failure output=<
Aug 26 10:32:40 centos7-03-08 kubelet[2063]: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory

Where does "BIRD" come from and how can this be resolved?

I have one control plane VM and three worker VMs. Each VM has three network interfaces. Two of them are active and have static IP addresses assigned to them, one for each.

All the four nodes (1 control plane node + 3 worker nodes) have identical contents in their /etc/cni/net.d/ directories as well as /var/lib/calico/ directories.

$ ssh somebody@192.168.12.17 "ls -tlr /etc/cni/net.d/ /var/lib/calico/";date;
/etc/cni/net.d/:
total 8
-rw-r--r--. 1 root root  805 Aug 25 20:36 10-calico.conflist
-rw-------. 1 root root 2718 Aug 25 20:37 calico-kubeconfig

/var/lib/calico/:
total 8
-rw-r--r--. 1 root root 13 Aug 25 20:37 nodename
-rw-r--r--. 1 root root  4 Aug 25 20:37 mtu

$

The kubelet service in the control-plane node has the following log output snipped

$ journalctl -x -u kubelet.service -f;

Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546857    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546952    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:05:40 centos7-03-05 kubelet[2625]: I0826 15:05:40.546973    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.547921    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548010    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:06:40 centos7-03-05 kubelet[2625]: I0826 15:06:40.548030    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549112    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549179    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-centos7-03-05" status=Running
Aug 26 15:07:40 centos7-03-05 kubelet[2625]: I0826 15:07:40.549198    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549414    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-centos7-03-05" status=Running
Aug 26 15:08:40 centos7-03-05 kubelet[2625]: I0826 15:08:40.549501    2625 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-centos7-03-05" status=Running

I installed calico CNI using the commands below as per the official documentation at "https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onpremises".

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.0/manifests/tigera-operator.yaml
kubectl create -f /tmp/custom-resources.yaml

The contents of "/tmp/custom-resources.yaml" are shown below.

---

  # This section includes base Calico installation configuration.
  # For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.Installation
  apiVersion: operator.tigera.io/v1
  kind: Installation
  metadata:
    name: default
  spec:
    # Configures Calico networking.
    calicoNetwork:
      # Note: The ipPools section cannot be modified post-install.
      ipPools:
        -
          blockSize: 26
          cidr: 172.22.0.0/16
          encapsulation: VXLANCrossSubnet
          natOutgoing: Enabled
          nodeSelector: all()
  
---
  
  # This section configures the Calico API server.
  # For more information, see: https://projectcalico.docs.tigera.io/master/reference/installation/api#operator.tigera.io/v1.APIServer
  apiVersion: operator.tigera.io/v1
  kind: APIServer 
  metadata: 
    name: default 
  spec: {}

The config file I supplied to kubeadm init command --config argument contains the following section (this is abbreviated version of the file)

  apiVersion: kubeadm.k8s.io/v1beta3
  kind: ClusterConfiguration
  networking:
    dnsDomain: cluster.local
    serviceSubnet: 172.21.0.0/16
    podSubnet: 172.22.0.0/16

The contents of "/etc/cni/net.d/10-calico.conflist" on the controlplane and worker nodes are identical.

$ cat /etc/cni/net.d/10-calico.conflist

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "datastore_type": "kubernetes",
      "mtu": 0,
      "nodename_file_optional": false,
      "log_level": "Info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "ipam": { "type": "calico-ipam", "assign_ipv4" : "true", "assign_ipv6" : "false"},
      "container_settings": {
          "allow_ip_forwarding": false
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "k8s_api_root":"https://172.21.0.1:443",
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    },
    {"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
  ]
}

I have deployed a pod on this system but the pod is in pending state as the worker nodes are not in "Ready" state. The output of the command below explains so

$ kubectl describe pod/my-nginx -n ns-test-02;

Yields the output below

{             kube_apiserver_node_01="192.168.12.17";             {               kubectl                 --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf                 describe pod/my-nginx              -n ns-test-02               ;             };           };
Name:         my-nginx
Namespace:    ns-test-02
Priority:     0
Node:         <none>
Labels:       app=nginx
              purpose=learning
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Containers:
  my-nginx:
    Image:        nginx
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-blxv4 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kube-api-access-blxv4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  22m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  16m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

Note the events section

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  22m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  16m   default-scheduler  0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.

The pod/my-nginx object was constructed using the command below.

{
  kube_apiserver_node_01="192.168.12.17";
  kubectl \
    --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
    create \
    namespace ns-test-02 \
  ;
}

{
  kube_apiserver_node_01="192.168.12.17";
  kubectl \
    --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf \
    --namespace=ns-test-02 \
    run my-nginx \
    --image=nginx \
    --restart=Never \
    --port=80 \
    --expose=true \
    --labels='purpose=learning,app=nginx' \
  ;
}

Below is a listing of the nodes,pods,services objects in the kubeadm based kubernetes cluster.

{             kube_apiserver_node_01="192.168.12.17";             {               kubectl                 --kubeconfig=/home/somebody/kubernetes-via-kubeadm/kubeadm/${kube_apiserver_node_01}/admin.conf                 get nodes,pods,services -A                 -o wide               ;             };           };                                                                                                                               
NAME                 STATUS     ROLES           AGE     VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
node/centos7-03-05   Ready      control-plane   5h33m   v1.24.2   192.168.12.17   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2
node/centos7-03-08   NotReady   <none>          5h33m   v1.24.2   192.168.12.20   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2
node/centos7-03-09   NotReady   <none>          5h32m   v1.24.2   192.168.12.21   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2
node/centos7-03-10   NotReady   <none>          5h32m   v1.24.2   192.168.12.22   <none>        CentOS Linux 7 (Core)   3.10.0-1160.76.1.el7.x86_64   cri-o://1.24.2

NAMESPACE          NAME                                           READY   STATUS    RESTARTS        AGE     IP               NODE            NOMINATED NODE   READINESS GATES
calico-apiserver   pod/calico-apiserver-658d588b56-bs5j6          1/1     Running   0               4h42m   172.22.147.134   centos7-03-05   <none>           <none>
calico-apiserver   pod/calico-apiserver-658d588b56-zxhpg          1/1     Running   0               4h42m   172.22.147.133   centos7-03-05   <none>           <none>
calico-system      pod/calico-kube-controllers-5f44c7d7d7-n7lfd   1/1     Running   2 (4h43m ago)   4h45m   172.22.147.129   centos7-03-05   <none>           <none>
calico-system      pod/calico-node-bj9f9                          1/1     Running   2 (4h42m ago)   4h45m   192.168.12.22    centos7-03-10   <none>           <none>
calico-system      pod/calico-node-brpjc                          1/1     Running   0               4h45m   192.168.12.20    centos7-03-08   <none>           <none>
calico-system      pod/calico-node-ksqqn                          1/1     Running   0               4h45m   192.168.12.17    centos7-03-05   <none>           <none>
calico-system      pod/calico-node-vpjx7                          1/1     Running   3 (4h42m ago)   4h45m   192.168.12.21    centos7-03-09   <none>           <none>
calico-system      pod/calico-typha-77c99dcb74-76rt4              1/1     Running   0               4h45m   192.168.12.22    centos7-03-10   <none>           <none>
calico-system      pod/calico-typha-77c99dcb74-qs5x8              1/1     Running   0               4h45m   192.168.12.21    centos7-03-09   <none>           <none>
calico-system      pod/csi-node-driver-gdr4r                      2/2     Running   0               4h44m   172.22.147.131   centos7-03-05   <none>           <none>
kube-system        pod/coredns-6d4b75cb6d-h4kxp                   1/1     Running   0               5h33m   172.22.147.130   centos7-03-05   <none>           <none>
kube-system        pod/coredns-6d4b75cb6d-n9f9h                   1/1     Running   0               5h33m   172.22.147.132   centos7-03-05   <none>           <none>
kube-system        pod/kube-apiserver-centos7-03-05               1/1     Running   0               5h33m   192.168.12.17    centos7-03-05   <none>           <none>
kube-system        pod/kube-controller-manager-centos7-03-05      1/1     Running   1 (4h43m ago)   5h33m   192.168.12.17    centos7-03-05   <none>           <none>
kube-system        pod/kube-proxy-5qfsl                           1/1     Running   0               5h32m   192.168.12.22    centos7-03-10   <none>           <none>
kube-system        pod/kube-proxy-r62r4                           1/1     Running   0               5h33m   192.168.12.17    centos7-03-05   <none>           <none>
kube-system        pod/kube-proxy-t7lnr                           1/1     Running   0               5h32m   192.168.12.21    centos7-03-09   <none>           <none>
kube-system        pod/kube-proxy-v4wjs                           1/1     Running   0               5h33m   192.168.12.20    centos7-03-08   <none>           <none>
kube-system        pod/kube-scheduler-centos7-03-05               1/1     Running   1 (4h43m ago)   5h33m   192.168.12.17    centos7-03-05   <none>           <none>
ns-test-02         pod/my-nginx                                   0/1     Pending   0               36s     <none>           <none>          <none>           <none>
tigera-operator    pod/tigera-operator-7ff575f7f7-6qhft           1/1     Running   1 (4h43m ago)   4h45m   192.168.12.20    centos7-03-08   <none>           <none>

NAMESPACE          NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
calico-apiserver   service/calico-api                        ClusterIP   172.21.182.168   <none>        443/TCP                  4h42m   apiserver=true
calico-system      service/calico-kube-controllers-metrics   ClusterIP   172.21.46.154    <none>        9094/TCP                 4h42m   k8s-app=calico-kube-controllers
calico-system      service/calico-typha                      ClusterIP   172.21.208.66    <none>        5473/TCP                 4h45m   k8s-app=calico-typha
default            service/kubernetes                        ClusterIP   172.21.0.1       <none>        443/TCP                  5h33m   <none>
kube-system        service/kube-dns                          ClusterIP   172.21.0.10      <none>        53/UDP,53/TCP,9153/TCP   5h33m   k8s-app=kube-dns
ns-test-02         service/my-nginx                          ClusterIP   172.21.208.139   <none>        80/TCP                   36s     app=nginx,purpose=learning

https://stackoverflow.com/questions/54465963/calico-node-is-not-ready-bird-is-not-ready-bgp-not-established — gapsf, Aug 27 '22 at 15:23

kubeadm based kubernetes (v1.24.2) worker nodes are still in "NotReady" status even after installing calico CNI ("Error querying BIRD")

0 Answers0