3

i have trouble to add a CNI to a kubernetes master node, the CNI plugin does not have access to certain files or folders. The logs from Calico and Flannel say that certain files or folders are not accessable (In the post I only refer to Calico).

I found the same problem for kubectl, kubeadm and kubelet with version v1.19.4 and v1.19.3. Docker is on version 19.03.13-ce and use overlay2 with an ext4 filesystem and systemd as cgroupdriver. Swap is disabled.

The only thing that goes in that direction what I found on stackoverflow is this: Kubernetes Cluster with Calico - Containers are not coming up & failing with FailedCreatePodSandBox

In the first step I setup the cluster with kubeadm (CIDR for calico):

# kubeadm init --apiserver-advertise-address=192.168.178.33 --pod-network-cidr=192.168.0.0/16

This is workinThis is working correctly, in the kubelet logs is the message that a CNI is required. After this I am applying the CNI calico:

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

After waiting some time the master node will remain in the following state:

❯ kubectl get pods --all-namespaces                 
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   calico-kube-controllers-5c6f6b67db-zdksz   0/1     ContainerCreating   0          7m47s
kube-system   calico-node-sc42z                          0/1     CrashLoopBackOff    5          7m47s
kube-system   coredns-f9fd979d6-4zrcj                    0/1     ContainerCreating   0          8m11s
kube-system   coredns-f9fd979d6-wf9r2                    0/1     ContainerCreating   0          8m11s
kube-system   etcd-hs-0                                  1/1     Running             0          8m20s
kube-system   kube-apiserver-hs-0                        1/1     Running             0          8m20s
kube-system   kube-controller-manager-hs-0               1/1     Running             0          8m20s
kube-system   kube-proxy-t6ngd                           1/1     Running             0          8m11s
kube-system   kube-scheduler-hs-0                        1/1     Running             0          8m20sere

For me, the information I got from the following command:

kubectl describe pods calico-node-sc42z --namespace kube-system

is inconsistent with the next code: The calico-node pod has a mounted volume but also that the pod has no access to it (look ad the volumes and the events).

❯ kubectl describe pods calico-node-sc42z --namespace kube-system
Name:                 calico-node-sc42z
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 hs-0/192.168.178.48
Start Time:           Sat, 14 Nov 2020 00:58:36 +0100
Labels:               controller-revision-hash=5f678767
                      k8s-app=calico-node
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.178.48
IPs:
  IP:           192.168.178.48
Controlled By:  DaemonSet/calico-node
Init Containers:
  upgrade-ipam:
    Container ID:  docker://29c6cf8b73ecb98ee18169db0f6ffe8b141a8a6e10b2c839fc5bf05177f066ac
    Image:         calico/cni:v3.16.5
    Image ID:      docker-pullable://calico/cni@sha256:e05d0ee834c2004e8e7c4ee165a620166cd16e3cb8204a06eb52e5300b46650b
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 14 Nov 2020 00:58:48 +0100
      Finished:     Sat, 14 Nov 2020 00:58:48 +0100
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
  install-cni:
    Container ID:  docker://4435863e0d2f3ab4535aa6ca49ff95d889e71614861f3c7c0e4213d8c333f4db
    Image:         calico/cni:v3.16.5
    Image ID:      docker-pullable://calico/cni@sha256:e05d0ee834c2004e8e7c4ee165a620166cd16e3cb8204a06eb52e5300b46650b
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 14 Nov 2020 00:58:49 +0100
      Finished:     Sat, 14 Nov 2020 00:58:49 +0100
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
  flexvol-driver:
    Container ID:   docker://ca03f59013c1576a4a605a6d737af78ec3e859376aa11a301e56f0ffdacbc8db
    Image:          calico/pod2daemon-flexvol:v3.16.5
    Image ID:       docker-pullable://calico/pod2daemon-flexvol@sha256:7b20fd9cc36c7196dd24d56cc1e89ac573c634856ee020334b0b30cf5b8a3d3b
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 14 Nov 2020 00:58:56 +0100
      Finished:     Sat, 14 Nov 2020 00:58:56 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
Containers:
  calico-node:
    Container ID:   docker://96bbc7f4adf1d5cb9a927aedc18e16da7b5ed4b0ff1290179a8dd4a51c115ab8
    Image:          calico/node:v3.16.5
    Image ID:       docker-pullable://calico/node@sha256:43c145b2bd837611d8d41e70631a8f2cc2b97b5ca9d895d66ffddd414dab83c5
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sat, 14 Nov 2020 01:04:51 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sat, 14 Nov 2020 01:03:41 +0100
      Finished:     Sat, 14 Nov 2020 01:04:51 +0100
    Ready:          False
    Restart Count:  5
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGSEVERITYSCREEN:            info
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/ from sysfs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-tzhr4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:  
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sysfs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:  
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  calico-node-token-tzhr4:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-tzhr4
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     :NoScheduleop=Exists
                 :NoExecuteop=Exists
                 CriticalAddonsOnly op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  6m52s                  default-scheduler  Successfully assigned kube-system/calico-node-sc42z to hs-0
  Normal   Pulling    6m51s                  kubelet            Pulling image "calico/cni:v3.16.5"
  Normal   Pulled     6m40s                  kubelet            Successfully pulled image "calico/cni:v3.16.5" in 10.618669742s
  Normal   Started    6m40s                  kubelet            Started container upgrade-ipam
  Normal   Created    6m40s                  kubelet            Created container upgrade-ipam
  Normal   Created    6m39s                  kubelet            Created container install-cni
  Normal   Pulled     6m39s                  kubelet            Container image "calico/cni:v3.16.5" already present on machine
  Normal   Started    6m39s                  kubelet            Started container install-cni
  Normal   Pulling    6m38s                  kubelet            Pulling image "calico/pod2daemon-flexvol:v3.16.5"
  Normal   Started    6m32s                  kubelet            Started container flexvol-driver
  Normal   Created    6m32s                  kubelet            Created container flexvol-driver
  Normal   Pulled     6m32s                  kubelet            Successfully pulled image "calico/pod2daemon-flexvol:v3.16.5" in 6.076268177s
  Normal   Pulling    6m31s                  kubelet            Pulling image "calico/node:v3.16.5"
  Normal   Pulled     6m19s                  kubelet            Successfully pulled image "calico/node:v3.16.5" in 12.051211859s
  Normal   Created    6m19s                  kubelet            Created container calico-node
  Normal   Started    6m19s                  kubelet            Started container calico-node
  Warning  Unhealthy  5m32s (x5 over 6m12s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
  Warning  Unhealthy  109s (x23 over 6m9s)   kubelet            Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1

Further I have the logs of the calico-node, but I do not understand how to benefit from this additional information: Unfortunately I don't know if datastore is referring to the file system, meaning this is the error I already know or if it is something additional.

❯ kubectl logs calico-node-sc42z -n kube-system -f
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 376: Early log level set to info
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 392: Using NODENAME environment for node name
2020-11-14 01:42:55.536 [INFO][8] startup/startup.go 404: Determined node name: hs-0
2020-11-14 01:42:55.539 [INFO][8] startup/startup.go 436: Checking datastore connection
2020-11-14 01:43:25.539 [INFO][8] startup/startup.go 451: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout
2020-11-14 01:43:56.540 [INFO][8] startup/startup.go 451: Hit error connecting to datastore - retry error=Get "https://10.96.0.1:443/api/v1/nodes/foo": dial tcp 10.96.0.1:443: i/o timeout

Maybe someone can give me a hint how to solve this problem or where to read about this topic. Greetings, Kokos Bot.

Kokos Bot
  • 63
  • 1
  • 6

1 Answers1

4

It might be because you Calico's default POD CIDR conflicting with Host CIDR. Just got that impression from your --apiserver-advertise-address=192.168.178.33. If that is the case, worth trying out with a different POD CIDR --pod-network-cidr=20.96.0.0/12 with kubeadm init

For a clean installation again, better do a kubeadm reset before the above changes. Please be aware of the kubeadm reset command impacts, before executing (Read here)

Reference - https://stackoverflow.com/questions/60742165/kubernetes-calico-replicaset

Syam Sankar
  • 156
  • 2