0

I'm trying to do a clean install of Kubernetes 1.23.x on a cluster of four Raspberry Pis, each running the x64 version of Raspberry Pi OS, however I am running into a major snag as soon as I try and run kubeadm init on the master node (before even attempting to get the other nodes to join). Namely: just five minutes after calling kubeadm init on the master node, the cluster stops working. In fact, it never really works to begin with. At first the server responds saying the node is NotReady, but then after 5 minutes it stops responding altogether.

So here's what I did, and what I saw: I installed containerd and kubeadm. Then I run the following command on the master node to try and start Kubernetes:

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 \
    --token-ttl=0 --apiserver-advertise-address=192.168.1.194

After running that command, and subsequently copying the /etc/kubernetes/admin.conf file to ~/.kube/config, I am able to run the following command:

$ kubectl get nodes

NAME           STATUS     ROLES                  AGE     VERSION
k8s-master-1   NotReady   control-plane,master   3m36s   v1.23.4

And it will continue to show a NotReady status for about 5 minutes, after which point the same command yields a very different result:

$ kubectl get nodes

The connection to the server 192.168.1.194:6443 was refused - did you specify the right host or port?

I'm not sure why this is happening, but it is very consistent. I have tried a few times now to kubeadm reset and then kubeadm init again, and the connection timeout always happens at the 5-minute mark. So the last time I tried to do this, I decided to tail all the log files under /var/log/containers/. After the 5-minute mark, it is repeatedly logging some variation of a connection error to 127.0.0.1:2379. For example:

2022-03-09T19:30:29.307156643-06:00 stderr F W0310 01:30:29.306871 1 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...

From Googling, it appears that etcd is running on that port, but then at the 5-minute mark, a bunch of services (including etcd) start shutting down. I've uploaded the full logs from the time kubeadm init runs, up until prior to the dreaded 5-minute mark, as a Gist.

I have already checked that all the ports are open, too. (They are.) During those first five minutes, I can telnet to local port 2379. Why won't Kubernetes start on my Pi? What am I missing?

UPDATE: As requested, I can provide a few more details. I saw a post recommending setting the value of --apiserver-advertise-address to 0.0.0.0 instead of the direct IP, so I tried that but it seemed to make no difference. I tried running systemctl status kubelet which shows that the kubelet service is "active" during that initial 5 minute period.

I also ran kubectl describe node k8s-master-1, which shows four events in this sequence:

  1. KubeletHasSufficientMemory
  2. KubeletHasNoDiskPressure
  3. KubeletHasSufficientPID
  4. KubeletNotReady

That last event is accompanied by this message: "container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized." So that got me thinking. I had been waiting for the Node to come up as Ready before installing Flannel (aka the CNI plugin), but this time I decided to try installing Flannel during that initial 5 minute period, using this command:

kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

And to my great surprise, that worked! Well, sort of. The master node did eventually start reporting a "Ready" status. And I noticed that all my pods came up with the notable exception of the coredns pods. However, after a short while, the kube-proxy pod (in the kube-system namespace) dies and gets stuck in a CrashLoopBackoff, and then later still the kube-controller-manager and kube-scheduler pods similarly enter a CrashLoopBackoff. Then, this time, after about 15 minutes, the whole cluster died again as before (meaning I got the same 'connection to the server was refused' message). So I feel like I'm a little bit closer, but also still a long ways away from getting this working.

SECOND UPDATE: A couple of things: it seems that when I install the flannel CNI plugin, coredns is either not included or doesn't work. But when I install weave works CNI then it at least tries to spin up coredns, although unfortunately those pods get stuck in ContainerCreating and never actually activate. So as requested, I am providing a number of additional logs. They're long enough to warrant uploading them separately so here's a link to a Gist containing four logs:

  1. Running kubectl -n kube-system logs pod/coredns-...
  2. Running kubectl -n kube-system logs pod/kube-controller-manager-k8s-master-1
  3. Running kubectl -n kube-system logs pod/kube-proxy-...
  4. Running kubectl describe node k8s-master-1

Note that before everything dies, the kube-controller-manager-... pod starts up but soon finds itself in a CrashLoopBackoff. While the coredns pods never start up successfully at all.

soapergem
  • 719
  • 4
  • 13
  • 29
  • What does dmesg or syslog show? – tilleyc Mar 24 '22 at 03:22
  • Run `kubectl describe node nodename` to check why the node is NotReady. Check `systemctl status kubelet` and `journalctl -u kubelet`. Did you turn off the swap? – mozello Mar 24 '22 at 11:04
  • @mozello I updated my original post with more details per your recommendations. – soapergem Mar 26 '22 at 04:30
  • Are coredns pods in 'Pending' state? Check if there is any 'taint' when you run 'kubectl describe node'. – mozello Mar 28 '22 at 17:14
  • Also, please add the logs from the kube-proxy pod `kubectl -n kube-system logs kube-proxy-...` – mozello Mar 28 '22 at 17:37
  • @mozello thanks for following up with me and apologies on the long delay. I've edited my original post to add a bunch of additional logs in a Gist link at the end. I really appreciate you looking at this! – soapergem Apr 23 '22 at 14:49

0 Answers0