2



I'm trying to learn k8s and since I happen to have access to OpenStack cloud I figured I'll try to install k8s on it, following this wiki.
So far I was able to initialize cluster, install weave CNI, connected an external worker and install OpenStack cloud controller manager. According to above Wiki, now I should wait for all pods in kube-system namespace to be running. I'm stuck with coredns pods though... They wouldn't move from Pending state.
From the pod's describe I can see that my problem is that master node stil has below taint:
node-role.kubernetes.io/master:NoSchedule
When I check the status of the node, it seems fine:

ubuntu@master-node-01:~$ kubectl get nodes -o wide
NAME             STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
master-node-01   Ready    master   10h   v1.17.0   10.99.53.6    <none>        Ubuntu 18.04.5 LTS   4.15.0-143-generic   docker://20.10.2
worker-node-01   Ready    <none>   10h   v1.17.0   10.99.53.5    <none>        Ubuntu 18.04.5 LTS   4.15.0-143-generic   docker://20.10.2

All the pods (except for coredns ones) are running fine:

ubuntu@master-node-01:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
kube-system   coredns-6955765f44-g2jnm                   0/1     Pending   0          10h   <none>       <none>           <none>           <none>
kube-system   coredns-6955765f44-wj7xb                   0/1     Pending   0          10h   <none>       <none>           <none>           <none>
kube-system   etcd-master-node-01                        1/1     Running   0          11h   10.99.53.6   master-node-01   <none>           <none>
kube-system   kube-apiserver-master-node-01              1/1     Running   0          11h   10.99.53.6   master-node-01   <none>           <none>
kube-system   kube-controller-manager-master-node-01     1/1     Running   0          11h   10.99.53.6   master-node-01   <none>           <none>
kube-system   kube-proxy-8s8r9                           1/1     Running   0          10h   10.99.53.5   worker-node-01   <none>           <none>
kube-system   kube-proxy-vtgnz                           1/1     Running   0          10h   10.99.53.6   master-node-01   <none>           <none>
kube-system   kube-scheduler-master-node-01              1/1     Running   0          11h   10.99.53.6   master-node-01   <none>           <none>
kube-system   openstack-cloud-controller-manager-dtczj   1/1     Running   0          10h   10.99.53.6   master-node-01   <none>           <none>
kube-system   weave-net-2z5n7                            2/2     Running   2          10h   10.99.53.5   worker-node-01   <none>           <none>
kube-system   weave-net-tm9p4                            2/2     Running   1          10h   10.99.53.6   master-node-01   <none>           <none>

I find find anything suspicious in pod's logs.

OpenStack I'm using doesn't have Octavia installed (Wiki says it's needed for setting up the LB, but my problem doesn't seem to be related to that).

If anyone here is able to help me find the way to investigate (and eventually solve) this problem, it would be greatly appreciated. Thanks.

Bartek Gmerek
  • 23
  • 1
  • 3

1 Answers1

0

It looks like problem with taints. You can try to solve the problem in several ways:

  • remove taint:
kubectl taint nodes $(hostname) node-role.kubernetes.io/master:NoSchedule-
  • edit node configuration and comment the taint part:
kubectl edit node <node_name>

You need to update the node after commenting.

  • schedule on master node without removing the taint:
apiVersion: extensions/v1beta1
kind: Deployment
...
  spec:
...
    spec:
...
      tolerations:
        - key: "node-role.kubernetes.io/master"
          effect: "NoSchedule"
          operator: "Exists"
kubectl taint nodes $(kubectl get nodes --selector=node-role.kubernetes.io/master | awk 'FNR==2{print $1}') node-role.kubernetes.io/master-
  • You mean I should remove the taint manually? I thought taints are being removed automatically by the relevant components... Isn't it the case? – Bartek Gmerek May 28 '21 at 12:58
  • Yes, this error means, that taints are still not removed. – Mikołaj Głodziak May 28 '21 at 13:05
  • I get it, but doesn't it mean there's some problem with cluster? I mean I'd assume taints will be removed along with cluster config progress. For instance: Right after initializing the cluster my master node had node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule taint. But it was removed after I've installed openstack-cloud-controller-manager. – Bartek Gmerek May 28 '21 at 15:37
  • 1
    **UPDATE** After removing all taints, **coredns** pods went to Running state. Thanks a lot @Mikołaj Głodziak. Still can't figure out why these taints haven't been removed after installing openstack-cloud-controller-manager – Bartek Gmerek May 28 '21 at 16:38