1

Starting today, k3s is failing to start with the following error: "Failed to start ContainerManager" err="failed to build map of initial containers from runtime: no PodsandBox found with Id '9f141a500138e081ae1a641d7d4c00c3029ecce87da6e2fc80f4a14bd0a965fd'. After this log line, it crashes.

I can't find anything on the internet, so does anyone here have an idea how to solve this?

I'm running k3s version: k3s version v1.21.5+k3s2 (724ef700)

Let me know if I need to provide additional details.

Log:

...
I1021 12:04:55.508161   78816 kuberuntime_manager.go:222] "Container runtime initialized" containerRuntime="containerd" version="v1.4.11-k3s1" apiVersion="v1alpha2"
I1021 12:04:55.508361   78816 server.go:1191] "Started kubelet"
E1021 12:04:55.509247   78816 cri_stats_provider.go:369] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
E1021 12:04:55.509273   78816 kubelet.go:1306] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
I1021 12:04:55.509255   78816 server.go:149] "Starting to listen" address="0.0.0.0" port=10250
I1021 12:04:55.509887   78816 server.go:409] "Adding debug handlers to kubelet server"
I1021 12:04:55.510952   78816 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
I1021 12:04:55.512769   78816 scope.go:111] "RemoveContainer" containerID="e5ce1c151a24558e69f544794a15bb6d1238139439a0c6174acf720a4f531a7c"
I1021 12:04:55.512865   78816 volume_manager.go:271] "Starting Kubelet Volume Manager"
I1021 12:04:55.512923   78816 desired_state_of_world_populator.go:141] "Desired state populator starts to run"
INFO[2021-10-21T12:04:55.516702675+02:00] RemoveContainer for "e5ce1c151a24558e69f544794a15bb6d1238139439a0c6174acf720a4f531a7c" 
DEBU[2021-10-21T12:04:55.527595023+02:00] openat2 not available, falling back to securejoin 
I1021 12:04:55.538886   78816 controller.go:611] quota admission added evaluator for: leases.coordination.k8s.io
I1021 12:04:55.545188   78816 kubelet_network_linux.go:56] "Initialized protocol iptables rules." protocol=IPv4
I1021 12:04:55.561242   78816 kubelet_network_linux.go:56] "Initialized protocol iptables rules." protocol=IPv6
I1021 12:04:55.561266   78816 status_manager.go:157] "Starting to sync pod status with apiserver"
I1021 12:04:55.561282   78816 kubelet.go:1846] "Starting kubelet main sync loop"
E1021 12:04:55.561318   78816 kubelet.go:1870] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
I1021 12:04:55.571567   78816 shared_informer.go:247] Caches are synced for endpoint slice config 
I1021 12:04:55.571570   78816 shared_informer.go:247] Caches are synced for service config 
INFO[2021-10-21T12:04:55.604476442+02:00] RemoveContainer for "e5ce1c151a24558e69f544794a15bb6d1238139439a0c6174acf720a4f531a7c" returns successfully 
I1021 12:04:55.604584   78816 scope.go:111] "RemoveContainer" containerID="4d7578dd7f7574fd5deeae1ed53cf67d0a2fe64aa1d1214b1ba865622c05b4cd"
INFO[2021-10-21T12:04:55.604877204+02:00] labels have been set successfully on node: <node name>
INFO[2021-10-21T12:04:55.604936435+02:00] RemoveContainer for "4d7578dd7f7574fd5deeae1ed53cf67d0a2fe64aa1d1214b1ba865622c05b4cd" 
I1021 12:04:55.612875   78816 kuberuntime_manager.go:1044] "Updating runtime config through cri with podcidr" CIDR="10.42.0.0/24"
INFO[2021-10-21T12:04:55.612967745+02:00] No cni config template is specified, wait for other system components to drop the config. 
I1021 12:04:55.613044   78816 kubelet_network.go:76] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="10.42.0.0/24"
I1021 12:04:55.623215   78816 kubelet_node_status.go:71] "Attempting to register node" node="<node name>"
E1021 12:04:55.645403   78816 kubelet.go:1384] "Failed to start ContainerManager" err="failed to build map of initial containers from runtime: no PodsandBox found with Id '9f141a500138e081ae1a641d7d4c00c3029ecce87da6e2fc80f4a14bd0a965fd'"
  • Hi GoldElysium welcome to S.F. It seems to be related to https://github.com/kubernetes/kubernetes/issues/98218 but I don't have enough contact with k3s to know how kubelet factors into its setup. More importantly, what is the "Starting today" part of your question? Was there an OS update? reboot? an employee quit? – mdaniel Oct 21 '21 at 15:55
  • Hi @mdaniel, the starting today meant that it was working fine before, but at a point, it began crashing, after which I performed a reboot hoping that would solve it, but then it started showing this error. That issue you linked does indeed look like the same error. – GoldElysium Oct 22 '21 at 08:55

1 Answers1

2

With help of https://github.com/kubernetes/kubelet/issues/21 I finally figured it out. After manually starting containerd with the following command: containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd (which I found in the k3s logs), I could search the container using crictl: k3s crictl ps -a | grep 9f141a which gave me an container id. Then I removed the pod using k3s crictl rm <id> and restarted k3s and now it's working again.