Having kube-dns scheduled on tainted nodes in GCP

Question

I have a GKE cluster with two node pools. One of those is a tainted node pools for use by specific pods.

After adding the tainted node pool, I realised that Kubernetes was trying to schedule a kube-dns pod on the nodes of the pool, but couldn't.

From what I understood, all nodes should have kube-dns deployed if I want DNS resolution to work. Maybe this is an incorrect assumption?

Since kube-dns (and other things in kube-system) are managed by GKE and not by me, I have no idea how to either:

if it's needed, tell kube-dns to tolerate my node pool, or
if it's not needed, tell kube-dns not to be scheduled on it.

I'm facing the same problem. Were you able to identify the root cause? — Jagadish G, May 24 '19 at 08:09
Nope, I never found anything clear about that, but then I stopped using GKE, so maybe there is something doable that I don't know about — Victor Noël, May 25 '19 at 12:33
Apparently custom tolerations are not supported on some system pods such as kube-dns, heapster, kube-dns-autoscaler. Because they are managed by GKE and GKE periodically reasserts the pods discarding any changes made by anyone else. This issue is still open and being tracked here https://github.com/kubernetes/kubernetes/issues/57659 — Jagadish G, May 28 '19 at 08:46

score 1 · Accepted Answer · answered Aug 14 '20 at 18:44

Currently using 1.1.5.12.-gke you should have deployed at least:

kube-dns deployment
kube-dns-autoscaler deployment

According to the docs kube-dns-autoscaler:

kube-dns scales to serve the DNS demands of the cluster. This scaling is controlled by the kube-dns-autoscaler which is deployed by default in all GKE clusters. kube-dns-autoscaler adjusts the number of replicas in the kube-dns deployment based on the number of nodes and cores in the cluster.

Preferred way of tuning kube-dns in the cluster should by:

By configuring kube-dns-autoscaler ConfigMap


    linear: '{"coresPerReplica":256,"min":1,"nodesPerReplica":16, "preventSinglePointFailure": true}'

where:

"preventSinglePointFailure": true controller ensures at least 2 replicas if there are more than one node.

Using this this parameters for current replicas it will be calculated as:

    replicas = max( ceil( cores × 1/coresPerReplica ) , ceil( nodes × 1/nodesPerReplica ) )

Manually:

    kubectl scale --replicas=0 deployment/kube-dns-autoscaler --namespace=kube-system
    kubectl scale --replicas=1 deployment/kube-dns --namespace=kube-system

Currently the problem you have experienced arises from default kube-dns deployment configuration:

toleration:
    - key: CriticalAddonsOnly
      operator: Exists
    - key: components.gke.io/gke-managed-components
      operator: Exists

This configuration prevent possibility to schedule pods on the nodes with your custom taints.

I would suggest to verify - why your pods can't be scheduled in the cluster in default-pool (probably due to lack of resources in the default-pool) and I would consider resizing this default-pool.

Another solution is to deploy custom kube-dns or core-dns configuration.

So basically, the deployment would like to prefer running duplicas on any node but my tainted ones, but since it can't and needs a minimum number of replicas, it still tried to deploy them on them. Increasing the number of nodes in a non-tainted node pool (the default one for example) should thus solve the problem. Is that it? — Victor Noël, Aug 15 '20 at 16:22
It should. In fact you should notice this information directly from pending pods by running `kubectl describe pod `. — Mark, Aug 17 '20 at 07:11

score 0 · Answer 2 · answered Feb 04 '19 at 13:20

0

By the way kube-dns lives in kube-system namespace. Taint your nodes to be available to schedule pods in this namespace because it is needed for normal work of your cluster.

answered Feb 04 '19 at 13:20

Nick Rak

167
7

Could you expand on this? I'm not exactly clear what you mean. How can I express that the nodes are available for the kube-system namespace? – Victor Noël Feb 05 '19 at 16:04
For the record, there are some of the kube-system pods running on the tainted nodes, but just not kube-dns. For example there is fluentd and kube-proxy. – Victor Noël Feb 05 '19 at 16:06
Kube-dns deployment doesn't have appropriate tolerations by default and therefore can't be scheduled on master node or any other tainted worker node. So to get it scheduled, it's required either to have at least one node without taints or add tolerations to the kube-dns deployment . – VAS Feb 13 '19 at 13:22

score 0 · Answer 3 · answered May 28 '19 at 08:47

Apparently custom tolerations are not supported on some system pods such as kube-dns, heapster, kube-dns-autoscaler. Because they are managed by GKE and GKE periodically reasserts the pods discarding any changes made by anyone else.

This issue is still open and being tracked here https://github.com/kubernetes/kubernetes/issues/57659

Having kube-dns scheduled on tainted nodes in GCP

3 Answers3