1

I have a set of worker nodes that are successfully joining my K8s cluster, but they are failing to schedule any pods due to their inability to pull the infrapod image from the internet. Our cluster is bare metal with Kubernetes 1.18 using CRI-O as its runtime (along with podman).

Output from the kubelet log from one of these nodes is as follows:

Sep 17 18:56:02 node1.data.worker kubelet[112861]: E0917 18:56:02.188261  112861 kuberuntime_sandbox.go:69] CreatePodSandbox for pod "kube-proxy-hf5q2_kube-system(3f66419d-d676-4d17-af30-be79425a779c)" failed: rpc error: code = Unknown desc = error creating pod sandbox with name "k8s_kube-proxy-hf5q2_kube-system_3f66419d-d676-4d17-af30-be79425a779c_0": Error initializing source docker://k8s.gcr.io/pause:3.2: error pinging docker registry k8s.gcr.io: Get https://k8s.gcr.io/v2/: dial tcp [2607:f8b0:4003:c13::52]:443: connect: network is unreachable

This particular error is clearly because the worker nodes are not connected to the internet, so it's reasonable.

However, what is not reasonable is that the node is trying to pull the puase image at all, given that the "PodSandboxImage" is supposed to only matter if kubelet is using Docker as its container runtime, which it is not configured to do.

I have confirmed this by looking at the kubelet command line on the worker node that generated the error above:

[sysmaint@node1 ~]$ systemctl status kubelet -l
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2020-09-16 21:08:55 GMT; 21h ago
     Docs: https://kubernetes.io/docs/
 Main PID: 112861 (kubelet)
    Tasks: 58
   CGroup: /system.slice/kubelet.service
           └─112861 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --cgroup-driver=systemd

As one can see above, the node is configured to use cri-o instead of Docker as its runtime.

Now, I'm sure I could probably fix this by changing the infra-pod-image in the kubelet configuration to point to our local docker registry, but according to the Kubernetes documentation, the system shouldn't even need that image if it isn't using Docker as its runtime.

This mismatch between docs and expected behavior has me thinking I'm missing something.

My questions are thus:

Is the system (i.e. kubelet and cri-o) supposed to still need the pause image even if it is using a container runtime of remote?

If not, what should I look to next to determine the problem here? Specifically, I assume something is misconfigured on the worker nodes

P.S. - As an aside, I am able to successfully schedule pods on the master node if I untaint it, but I assume that is because the master node does have the pause image and is able to connect to the internet.

stix
  • 131
  • 4
  • 1
    *"PodSandboxImage" is supposed to only matter if kubelet is using Docker as its container runtime* what would give you that impression? My clusters run on `containerd` and certainly use the sandbox because that's the mechanism through which all containers in a Pod share the same network identity – mdaniel Sep 17 '20 at 23:38
  • The Kubernetes documentation indicates that the "PodSandboxImage" setting is skipped entirely if the container runtime is set to "remote." If it's still needed, why wouldn't k8s let you change where to find it? – stix Sep 18 '20 at 17:17
  • They do, in at least 3 different ways [`containerd.conf`](https://github.com/kubernetes-sigs/kubespray/blob/v2.14.0/roles/container-engine/containerd/templates/config.toml.j2#L21), [`crio.conf`](https://github.com/kubernetes-sigs/kubespray/blob/v2.14.0/roles/container-engine/cri-o/templates/crio.conf.j2#L319) and [via a kubelet command line flag](https://github.com/kubernetes-sigs/kubespray/blob/v2.14.0/roles/kubernetes/node/templates/kubelet.env.v1beta1.j2#L16); the kubespray template has a docker-specific `if`, so maybe that's what you mean? – mdaniel Sep 18 '20 at 19:30

0 Answers0