0

When adding a new node to a Kubernetes cluster I end up with this error :

+ docker start kubelet
Error response from daemon: {"message":"No such container: kubelet"}
Error: failed to start containers: kubelet
+ sleep 2

This error occurs on a cluster that is already damaged. There is only one node left out of 3. The node remaining a priori a problem of certificate recovery and distribution. SSL is no longer functional on this node. For information, the Kubernetes cluster has been deployed through Rancher. The etcd container regularly reboots on node 3, and etcd does not want to deploy to the other nodes that I am trying to re-integrate into the cluster.

Kubelet is launched in a Docker container, itself launched by Rancher when he created the Kubernetes cluster. Regarding the tests carried out, I relaunched a new docker container with etcd, I tried to start again from a snapshot ... nothing allows to relaunch the cluster. Adding a new node is also not functional. From what I've seen, there is also an issue with ssl certificates created by Rancher that he cannot find

TheoV
  • 1
  • 1

1 Answers1

0

Try to do following steps:

  1. Clean the node by running
docker system prune
docker volume prune

This will delete all the Docker volumes, be careful if you have
important data in your volumes.

  1. Clean Rancher/Kubernetes runtime data on the node.
rm -rf /etc/cni/ /etc/kubernetes/ /opt/cni/ /var/lib/calico/ /var/lib/cni/ /var/lib/rancher/ /var/run/calico/

The official docs on node cleanup recommend also removal of /opt/rke and
/var/lib/etcd. You cannot remove them because they contain cluster etcd
snapshots and data. This is especially important in case there's only one node
in the cluster.

  1. Run exec-ed into the rancher container and hacked the cluster status (thx
    @ibrokethecloud for the hint):
docker exec -it rancher bash

Inside the container:

apt-get update && apt-get -y install vim
kubectl edit cluster c-XXXX  # replace the cluster-id with an actual cluster ID

The editor find the key apiEndpoint (it should be directly under
the status key) and removed it. Exit the editor and container. Make sure
kubectl says that it updated the cluster.

  1. From the Rancher UI got the command for registering new node.
    Set a different name for the node than it was before by adding a
    --node-name to the docker run command (actually there's an edit box for this
    under advanced settings). It looked like this:
docker run -d --privileged --restart=unless-stopped --net=host \
  -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.2.6 \
  --server https://rancher.example.com --token XXXXXXXXXXXXXXX --node-name mynode2 \
  --etcd --controlplane --worker

  1. Run the above command on the cleaned node and finally it registered
    successfully and RKE started up all the kube-* and kubelet containers.

Take a look: rancher-kubelet, rancher-2-getting-started.

Malgorzata
  • 358
  • 1
  • 5