Thank you for reading and taking your time to review this problem.
I have a problem using my Kubernetes cluster.
It is running Flatcar Linux, made by Kinvolk, recently acquired by Microsoft. I have setup the cluster using their Lokomotive (lokoctl
) tool.
I have 4 nodes in total.
- socrates001 (master)
- socrates002 (node)
- socrates003 (node)
- socrates004 (node)
Today, around 2 PM, my master node restarted because of the auto update service provided by Lokomotive (the cluster management tool made by Kinvolk).
My master node came back up, however the k8s did not.
The output of docker container ls
ran on socrates001
is the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e33995c69e10 quay.io/kinvolk/kubelet:v1.21.4 "/usr/local/bin/kubeā¦" 7 minutes ago Up 7 minutes kubelet
b6093a1f343a quay.io/coreos/etcd:v3.4.16 "/usr/local/bin/etcd" 7 minutes ago Up 7 minutes etcd
This indicates that Kubelet and Etcd are running. Kubelet however is giving me a lot of errors, and honestly, I would not know where to start digging...
When I run journalctl -u kubelet
, it gives me the following output, I've put it in a Pastebin because it's too big. Warning, it's a big one.
https://pastebin.com/A9Lmf0tc
Things I've already tried:
- Rebooting the master node
- Restarting kubelet
- Restarting etcd
- Manually trying to start up the kube api, however this immediately gets terminated (by kubelet I think)
- Forced swap off with
sudo swapoff -a
, however during provisioning of the cluster, I'm quite sure that Lokomotive already turns it off on Flatcar Linux.
I have no clue why this is happening at all, so all comments and answers are welcome! I'm a student with quite a bit of time, so you should get a reply quickly.
Thanks in advance!
Edit: it looks like there is a bug in the Kubelet Checkpointer. I have filed an issue with Lokomotive here: https://github.com/kinvolk/lokomotive/issues/1576