0

I'm trying to setup a small 4 worker node cluster atm and I just installed k3s on my raspberry pi 4s (8gb) and I'm getting a NotReady status. I'm new to kubernettes/k3s, but I believe with a totally fresh install, things should 'just work'. I have a fresh wipe and install of Ubuntu 22.04 server for 64 bit arm. Since the terminal output is so long, I have a pastbin here. It looks like the pods on the master are failing to mount volumes and failures to make a sandbox. Also I'm having apiserver issues, which I think is related to these mounting and sandbox errors as after several tries the apiserver will eventually respond. So I guess, wtf is going on. Can anyone help make sense of this? Why is my master node struggling to mount volumes? How do I even begin to fix this?

zeus@atlas00:~$ kubectl get nodes
NAME      STATUS     ROLES                  AGE     VERSION
atlas04   NotReady   <none>                 7h32m   v1.23.6+k3s1
atlas08   NotReady   <none>                 7h36m   v1.23.6+k3s1
atlas06   NotReady   <none>                 7h36m   v1.23.6+k3s1
atlas02   Ready      <none>                 7h32m   v1.23.6+k3s1
atlas00   NotReady   control-plane,master   8h      v1.23.6+k3s1
zeus@atlas00:~$ kubectl get pods -n kube-system -o wide
NAME                                      READY   STATUS              RESTARTS   AGE   IP       NODE      NOMINATED NODE   READINESS GATES
helm-install-traefik-qzxlm                0/1     ContainerCreating   0          8h    <none>   atlas00   <none>           <none>
local-path-provisioner-6c79684f77-bb9bn   0/1     Pending             0          8h    <none>   <none>    <none>           <none>
helm-install-traefik-crd-tg52k            0/1     ContainerCreating   0          8h    <none>   atlas00   <none>           <none>
metrics-server-7cd5fcb6b7-qz88k           0/1     Pending             0          8h    <none>   <none>    <none>           <none>
coredns-d76bd69b-9dzpc                    0/1     ContainerCreating   0          8h    <none>   atlas00   <none>           <none>
zeus@atlas00:~$ kubectl describe pod helm-install-traefik-qzxlm -n kube-system
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
zeus@atlas00:~$ kubectl describe pod helm-install-traefik-qzxlm -n kube-system
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
zeus@atlas00:~$ kubectl describe pod helm-install-traefik-qzxlm -n kube-system
Error from server (InternalError): an error on the server ("apiserver not ready") has prevented the request from succeeding (get pods helm-install-traefik-qzxlm)
zeus@atlas00:~$ kubectl describe pod helm-install-traefik-qzxlm -n kube-system
Name:           helm-install-traefik-qzxlm
Namespace:      kube-system
Priority:       0
Node:           atlas00/192.168.1.50
Start Time:     Tue, 24 May 2022 08:07:56 +0000
Labels:         controller-uid=1f431fba-cb3a-45cc-880a-5be734db988e
                helmcharts.helm.cattle.io/chart=traefik
                job-name=helm-install-traefik
Annotations:    helmcharts.helm.cattle.io/configHash: SHA256=8BE6F0CEB108C2A3A1EC5A8F7591596C00670380ACEA294775E4769C94AEE7A2
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  Job/helm-install-traefik
Containers:
  helm:
    Container ID:  
    Image:         rancher/klipper-helm:v0.7.1-build20220407
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Args:
      install
      --set-string
      global.systemDefaultRegistry=
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      NAME:              traefik
      VERSION:           
      REPO:              
      HELM_DRIVER:       secret
      CHART_NAMESPACE:   kube-system
      CHART:             https://%{KUBERNETES_API}%/static/charts/traefik-10.19.300.tgz
      HELM_VERSION:      
      TARGET_NAMESPACE:  kube-system
      NO_PROXY:          .svc,.cluster.local,10.42.0.0/16,10.43.0.0/16
      FAILURE_POLICY:    reinstall
    Mounts:
      /chart from content (rw)
      /config from values (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f9qlx (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  values:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-values-traefik
    Optional:  false
  content:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik
    Optional:  false
  kube-api-access-f9qlx:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               8h                     default-scheduler  Successfully assigned kube-system/helm-install-traefik-qzxlm to atlas00
  Warning  FailedMount             8h                     kubelet            MountVolume.SetUp failed for volume "content" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             8h                     kubelet            MountVolume.SetUp failed for volume "kube-api-access-f9qlx" : failed to fetch token: serviceaccounts "helm-traefik" is forbidden: User "system:node:atlas00" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system": no relationship found between node 'atlas00' and this object
  Warning  FailedMount             8h (x2 over 8h)        kubelet            MountVolume.SetUp failed for volume "values" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h53m                  kubelet            MountVolume.SetUp failed for volume "values" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h53m                  kubelet            MountVolume.SetUp failed for volume "kube-api-access-f9qlx" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h53m (x2 over 7h53m)  kubelet            MountVolume.SetUp failed for volume "content" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h52m                  kubelet            MountVolume.SetUp failed for volume "values" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h52m (x2 over 7h52m)  kubelet            MountVolume.SetUp failed for volume "content" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h52m                  kubelet            MountVolume.SetUp failed for volume "kube-api-access-f9qlx" : failed to fetch token: serviceaccounts "helm-traefik" is forbidden: User "system:node:atlas00" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system": no relationship found between node 'atlas00' and this object
  Warning  FailedMount             7h41m                  kubelet            MountVolume.SetUp failed for volume "content" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h41m                  kubelet            MountVolume.SetUp failed for volume "values" : failed to sync configmap cache: timed out waiting for the condition
  Warning  FailedMount             7h41m                  kubelet            MountVolume.SetUp failed for volume "kube-api-access-f9qlx" : failed to fetch token: serviceaccounts "helm-traefik" is forbidden: User "system:node:atlas00" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system": no relationship found between node 'atlas00' and this object
  Warning  FailedCreatePodSandBox  7h40m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to prepare extraction snapshot "extract-476722526-09RL sha256:c640e628658788773e4478ae837822c9bc7db5b512442f54286a98ad50f88fd4": failed to rename: rename /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/new-2732139020 /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/4: file exists
  Warning  FailedCreatePodSandBox  6h54m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b9f0346aa924105c7c3498ecb6315c32e13d4237eaa062cea2926401ba1c0ab6": plugin type="flannel" failed (add): open /run/flannel/subnet.env: no such file or directory
  Warning  FailedCreatePodSandBox  6h42m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "41b66aa473ffaee3ae32567c0ff2fe233f35569ea15b3301cfab127e92efce69": plugin type="flannel" failed (add): open /run/flannel/subnet.env: no such file or directory
neogeek23
  • 101
  • 1

1 Answers1

0

you run the cluster behind proxy or in air-gap environment? if so, the event "FailedCreatePodSandBox" with the log "failed to pull image..." can be becouse you didn't setup registry mirror correctly.

if you run with docker, add to your /etc/docker/daemon.json: ... "registry-mirrors": ["https://"] ...

if you use containerd dirctly, add to your registries.yaml: .... mirrors: mycustomreg.com: endpoint: - "https://mycustomreg.com:5000" ....

  • Thanks for your consideration. I didn't intentionally setup my cluster behind a proxy - it is just attached to my router and has direct access to the internet. Thanks though. I'm thinking I'm just going to nuke it and start over :( – neogeek23 Jun 16 '22 at 03:18