0

In order to install a master kubernetes node on centos7 with containerd and calico :

I followed this steps : https://computingforgeeks.com/install-kubernetes-cluster-on-centos-with-kubeadm/

After the kubeadm init --pod-network-cidr=192.168.0.0/16 --upload-certs

I installed calico with :

(the proxy didn't let me run as it, so i first downloaded the files, they did a create on the file)

Then the install was ok but :coredns and calico-kube-controllers are stuck in ContainerCreating

this installation use a company dns and proxy, i have been stuck on this for days and can't findout why coredns is stuck in ContainerCreating

 [root@master-node system]# kubectl get pod -A
   NAMESPACE         NAME                                       READY   STATUS              RESTARTS        AGE
        calico-system     calico-kube-controllers-68884f975d-6qm5l   0/1     Terminating         0               16d
        calico-system     calico-kube-controllers-68884f975d-ckr2g   0/1     ContainerCreating   0               154m
        calico-system     calico-node-5n4nj                          0/1     Running             7 (165m ago)    16d
        calico-system     calico-node-gp6d5                          0/1     Running             1 (15d ago)     16d
        calico-system     calico-typha-77b6fb6f86-zc8jn              1/1     Running             7 (165m ago)    16d
        kube-system       coredns-6d4b75cb6d-2tqk9                   0/1     ContainerCreating   0               4h46m
        kube-system       coredns-6d4b75cb6d-9dn5d                   0/1     ContainerCreating   0               6h58m
        kube-system       coredns-6d4b75cb6d-vfchn                   0/1     Terminating         32              15d
        kube-system       etcd-master-node                           1/1     Running             14 (165m ago)   16d
        kube-system       kube-apiserver-master-node                 1/1     Running             8 (165m ago)    16d
        kube-system       kube-controller-manager-master-node        1/1     Running             7 (165m ago)    16d
        kube-system       kube-proxy-c6l9s                           1/1     Running             7 (165m ago)    16d
        kube-system       kube-proxy-pqrf8                           1/1     Running             1 (15d ago)     16d
        kube-system       kube-scheduler-master-node                 1/1     Running             8 (165m ago)    16d
        tigera-operator   tigera-operator-5fb55776df-955dj           1/1     Running             13 (164m ago)   16d

kubectl describe pod coredns

[root@master-node system]# kubectl describe pod coredns-6d4b75cb6d-2tqk9  -n kube-system
Name:                 coredns-6d4b75cb6d-2tqk9
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 master-node/10.32.67.20
Start Time:           Wed, 08 Jun 2022 11:59:59 +0200
Labels:               k8s-app=kube-dns
                      pod-template-hash=6d4b75cb6d
Annotations:          <none>
Status:               Pending
IP:
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-6d4b75cb6d
Containers:
  coredns:
    Container ID:
    Image:         k8s.gcr.io/coredns/coredns:v1.8.6
    Image ID:
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ch9xq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-ch9xq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                   From     Message
  ----     ------                  ----                  ----     -------
  Warning  FailedCreatePodSandBox  114s (x65 over 143m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "de60ae0a286ad648a9691065e68fe03589b18a26adfafff0c089d5774b46c163": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable

kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'

[root@master-node system]# kubectl get events --all-namespaces  --sort-by='.metadata.creationTimestamp'
NAMESPACE       LAST SEEN   TYPE      REASON                   OBJECT                                         MESSAGE
calico-system   5m52s       Warning   Unhealthy                pod/calico-node-gp6d5                          (combined from similar events): Readiness probe failed: 2022-06-08 14:50:45.231 [INFO][30872] confd/health.go 180: Number of node(s) with BGP peering established = 0...
calico-system   4m16s       Warning   FailedKillPod            pod/calico-kube-controllers-68884f975d-6qm5l   error killing pod: failed to "KillPodSandbox" for "c842d857-88f1-4dfa-b3e8-aad68f626c8c" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5002f084e667a7e70654136b237ae2924c268337c1faf882972982e888784bb9\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": Service Unavailable"
kube-system     87s         Warning   FailedCreatePodSandBox   pod/coredns-6d4b75cb6d-9dn5d                   (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "acd785aa916d2c97aa16ceeaa2f04e7967a1224cb437e50770f32a02b5a9ed3f": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
calico-system   13m         Warning   FailedKillPod            pod/calico-kube-controllers-68884f975d-6qm5l   error killing pod: failed to "KillPodSandbox" for "c842d857-88f1-4dfa-b3e8-aad68f626c8c" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5002f084e667a7e70654136b237ae2924c268337c1faf882972982e888784bb9\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": context deadline exceeded"
kube-system     4m6s        Warning   FailedKillPod            pod/coredns-6d4b75cb6d-vfchn                   error killing pod: failed to "KillPodSandbox" for "23c399a5-daa6-4f01-b7ee-7822b828d966" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"7621e8d64c84554d75030375b0355a67c60b62c8d240741aa78189ffabedc913\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": Service Unavailable"
calico-system   6s          Warning   Unhealthy                pod/calico-node-5n4nj                          (combined from similar events): Readiness probe failed: 2022-06-08 14:56:31.871 [INFO][17966] confd/health.go 180: Number of node(s) with BGP peering established = 0...
calico-system   45m         Warning   DNSConfigForming         pod/calico-kube-controllers-68884f975d-ckr2g   Search Line limits were exceeded, some search paths have been omitted, the applied search line is: calico-system.svc.cluster.local svc.cluster.local cluster.local XXXXXX.com cs.XXXXX.com fr.XXXXXX.com
kube-system     2m49s       Warning   FailedCreatePodSandBox   pod/coredns-6d4b75cb6d-2tqk9                   (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "529139e14dbb8c5917c72428600c5a8333aa21bf249face90048d1b344da5d9a": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
calico-system   3m42s       Warning   FailedCreatePodSandBox   pod/calico-kube-controllers-68884f975d-ckr2g   (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "45dd6ebfb53fd745b1ca41853bb7744e407b3439111a946b007752eb8f8f7abd": plugin type="calico" failed (add): error getting ClusterInformation: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": Service Unavailable
kube-system     9m6s        Warning   FailedKillPod            pod/coredns-6d4b75cb6d-vfchn                   error killing pod: failed to "KillPodSandbox" for "23c399a5-daa6-4f01-b7ee-7822b828d966" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"7621e8d64c84554d75030375b0355a67c60b62c8d240741aa78189ffabedc913\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": context deadline exceeded"

calico-node logs are :

(resync-filter-v4,resync-raw-v4) 2022-06-08 18:26:42.665 [INFO][69] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m2.6s: avg=3ms longest=6ms (resync-nat-v4) 2022-06-08 18:27:46.076 [INFO][69] felix/summary.go 100: Summarising 7 dataplane reconciliation loops over 1m3.4s: avg=2ms longest=4ms (resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,resync-wg)

calico typha :

2022-06-08 17:34:49.625 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/assignment/" error=too old resource version: 190422 (3180569) 2022-06-08 17:34:50.121 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/" 2022-06-08 18:10:27.377 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 190388 (3180569) 2022-06-08 18:10:27.874 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/"
awot83
  • 31
  • 1
  • 5
  • your SDN is all down (coredns & calico controller can't connect the kubernetes service in default namespace). You should look into the calico-nodes and typha instead. What are their logs telling you? Do we know why calico-nodes show as unready (0/1)? – SYN Jun 08 '22 at 16:58
  • also: while setting up kubernetes "manually" is a good experience to have, maybe you could try with tools such as "kops" or "kubespray", which are part of Kubernetes ecosystem, ... might be easier to get your cluster up and running, in an easily-reproducible way, ... something that you could easily upgrade later on, ... – SYN Jun 08 '22 at 17:03
  • calico-node logs are : (resync-filter-v4,resync-raw-v4) 2022-06-08 18:26:42.665 [INFO][69] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m2.6s: avg=3ms longest=6ms (resync-nat-v4) 2022-06-08 18:27:46.076 [INFO][69] felix/summary.go 100: Summarising 7 dataplane reconciliation loops over 1m3.4s: avg=2ms longest=4ms (resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,resync-wg) – awot83 Jun 08 '22 at 18:30
  • calico typha : 2022-06-08 17:34:49.625 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/assignment/" error=too old resource version: 190422 (3180569) 2022-06-08 17:34:50.121 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/assignment/" 2022-06-08 18:10:27.377 [INFO][7] watchercache.go 125: Watch error received from Upstream ListRoot="/calico/ipam/v2/host/" error=too old resource version: 190388 (3180569) 2022-06-08 18:10:27.874 [INFO][7] watchercache.go 181: Full resync is required ListRoot="/calico/ipam/v2/host/" – awot83 Jun 08 '22 at 18:31
  • regarding the deployment tool : i used kubeadm for this installation – awot83 Jun 09 '22 at 08:05
  • Could you edit your initial message, with those logs (isn't there more of those?). Also some "kubectl describe pod" for those pods? Regarding the deployment tool: it was clear from your initial post. My point is there are easier options, if you're discovering Kubernetes. Blog posts can't beat Ansible or Terraform in terms of reproducibility. While CentOS7 sounds like an odd choice for an OS in 2022. – SYN Jun 09 '22 at 21:16
  • 1
    i agree for centos7 but our production is deployed on it, so this preprod should help us work on migration. also my point here is that i can t find any log that help me understand why this pod are in ContainerCreating state, maybe i don't look at the right logs.. – awot83 Jun 14 '22 at 13:27

1 Answers1

1

i solved the problem this way :

by adding these IP range in no_proxy of these 2 files :

  • 10.96.0.0/24 (kubernetes API)
  • 192.168.0.0/16 (CIDR calico)
  • 10.x.x.0 (cluster node)

in : /etc/environement :

HTTP_PROXY=http://myproxy-XXXXXXXX.com:8080
HTTPS_PROXY=http://myproxy-XXXXXXXX.com:8080
NO_PROXY=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27   
http_proxy=http://myproxy-XXXXXXXX.com:8080
https_proxy=http://myproxy-XXXXXXXX.com:8080
no_proxy=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27

then :

source environement

in : /etc/systemd/system/containerd.service.d/http_proxy.conf

[Service]
Environment="HTTP_PROXY=http://myproxy-XXXXXXXX:8080/"
Environment="HTTPS_PROXY=http://myproxy-XXXXXXXX:8080/"
Environment="NO_PROXY=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27"
Environment="http_proxy=http://myproxy-XXXXXXXX:8080/"
Environment="https_proxy=http://myproxy-XXXXXXXX:8080/"
Environment="no_proxy=localhost,127.0.0.1,10.96.0.0/24,192.168.0.0/16,10.x.x.0/27"

then :

systemctl daemon-reload
systemctl restart containerd

also i edited this file as follow :

kubectl -n kube-system edit cm coredns
  • supression of : max_concurrent 1000\n
  • replaced "proxy" by "foward"

then i "kubectl delete" on pods in error then all pods are running ok

hope it helps

awot83
  • 31
  • 1
  • 5