Deleting a control node from the cluster kills the apiserver

Question

When I have a kubernetes cluster with multiple control nodes and delete one of these, the whole API server does not seem to be available anymore.

In this setup I want to scale down from two to one control node but end up rendering the cluster unusable:

$ kubectl get nodes
NAME      STATUS   ROLES    AGE     VERSION
master1   Ready    master   5d20h   v1.18.6
worker1   Ready    <none>   5d19h   v1.18.6
master2   Ready    master   19h     v1.18.6
$ kubectl drain master2 --ignore-daemonsets
node/master2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-hns7p, kube-system/kube-proxy-vk6t7
node/master2 drained
$ kubectl get nodes
NAME      STATUS                     ROLES    AGE     VERSION
master1   Ready                      master   5d20h   v1.18.6
worker1   Ready                      <none>   5d20h   v1.18.6
master2   Ready,SchedulingDisabled   master   19h     v1.18.6
$ kubectl delete node master2
node "master2" deleted
$ kubectl get nodes
NAME      STATUS   ROLES    AGE     VERSION
master1   Ready    master   5d20h   v1.18.6
worker1   Ready    <none>   5d20h   v1.18.6
$ ssh master2
$ sudo kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W0811 10:24:49.750898    7159 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "master2" not found
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0811 10:24:51.487912    7159 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
$ exit
$ kubectl get nodes
Error from server: etcdserver: request timed out
$ kubectl cluster-info

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The connection to the server master1:6443 was refused - did you specify the right host or port?

What's missing here? Or how else is removing a control plane node different from removing a worker node? Pointers are appreciated.

I am not 100% sure but I don't think scaling below 2 masters is supported when the cluster was originally set up with 2 or more masters. — Michael Hampton, Aug 11 '20 at 17:12
It works fine, but **etcd** is another matter entirely (if they're running in a stacked configuration) -- and even then, it's possible to draw down the number of etcd members but one must do that _explicitly_ -- etcd gets **realllllllllllyyyyyyy mad** if a member just up and disappears — mdaniel, Aug 11 '20 at 20:56
Thanks to both of you! That's good to know. Especially as this isn't totally obvious for the ordinary K8s user. (Most documentation only refers to worker nodes when it comes to draining and deleting nodes.) — Windowlicker, Aug 11 '20 at 21:03

score 1 · Accepted Answer · answered Aug 12 '20 at 08:58

You have two master nodes and this also means that you have two etcd replicas.

In etcd documentation you can read:

It is recommended to have an odd number of members in a cluster. An odd-size cluster tolerates the same number of failures as an even-size cluster but with fewer nodes. The difference can be seen by comparing even and odd sized clusters:

Cluster Size    Majority    Failure Tolerance
1               1           0
2               2           0
3               2           1

So as you can see, having etcd cluster of size 2 requires all replicas to work properly and it doesn't tolerate any failures. This is why it's higly recommended to use odd number of etcd replicas.

So I belive that now you understand why your cluster went down.

Also check kubernetes documantation about kubeadm: high availability topology.

Thanks. I am aware of the odd-number recommendation. As this is not a HV cluster, the second master node was only meant as a temporary addition to verify that adding new control nodes would potentially work. Not being "allowed" to temporarily add control nodes was a lesson to learn. — Windowlicker, Aug 13 '20 at 13:34

Deleting a control node from the cluster kills the apiserver

1 Answers1