I have a problem using my Kubernetes cluster. It is running Flatcar Linux, made by Kinvolk, recently acquired by Microsoft. I have setup the cluster using their Lokomotive (lokoctl) tool.

I have 4 nodes in total.

  • socrates001 (master)
  • socrates002 (node)
  • socrates003 (node)
  • socrates004 (node)

Today, around 2 PM, my master node restarted because of the auto update service provided by Lokomotive (the cluster management tool made by Kinvolk).

My master node came back up, however the k8s did not.

The output of docker container ls ran on socrates001 is the following:

CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS         PORTS     NAMES
e33995c69e10   quay.io/kinvolk/kubelet:v1.21.4   "/usr/local/bin/kubeā€¦"   7 minutes ago   Up 7 minutes             kubelet
b6093a1f343a   quay.io/coreos/etcd:v3.4.16       "/usr/local/bin/etcd"    7 minutes ago   Up 7 minutes             etcd

This indicates that Kubelet and Etcd are running. Kubelet however is giving me a lot of errors, and honestly, I would not know where to start digging...

When I run journalctl -u kubelet, it gives me the following output, I've put it in a Pastebin because it's too big. Warning, it's a big one. https://pastebin.com/A9Lmf0tc

Things I've already tried:

  • Rebooting the master node
  • Restarting kubelet
  • Restarting etcd
  • Manually trying to start up the kube api, however this immediately gets terminated (by kubelet I think)
  • Forced swap off with sudo swapoff -a, however during provisioning of the cluster, I'm quite sure that Lokomotive already turns it off on Flatcar Linux.

I have no clue why this is happening at all, so all comments and answers are welcome! I'm a student with quite a bit of time, so you should get a reply quickly.

Edit: it looks like there is a bug in the Kubelet Checkpointer. I have filed an issue with Lokomotive here: https://github.com/kinvolk/lokomotive/issues/1576

