0

Ok so at work we were planning to scale down the number of nodes in Azure Kubernetes service. Before doing this I wanted to see what would happen if I overloaded the nodes on a test cluster.

On a 3 node test cluster I wrote a overload.yaml which spawned 200 wordpress pods

kubectl apply -f overload.yaml kubectl get deployments
--all-namespaces=true 


This shows everything looks good, Azure's web portal showed only 30% cpu and ram usage. (It said 200 wordpress pods desired, 200 wordpress pods available, and it showed 8 pods from the kube-system namespace, and showed them all as available)

All good so I bumped it up to 300 wordpress replicas.
now kubectl get deployments --all-namespaces=true shows 300 wordpress pods desired, 105 wordpress pods available. It showed 0 of 8 kube-system deployments available, later only 2 of 8 restarted, which seems like a really bad thing,
Azure's web portal showed 2 nodes were unavailable. az aks browse stopped working kubectl get pods --namespace=kube-system shows status nodelost, unknown, pending, and only 2 running that successfully autohealed.
~An hour later the Azure nodes were replaced based on uptime listed in the Azure web portal. I think they went down only because the kube-system pods went down, which I'm guessing caused them to fail a health check and triggered some auto recovery mechanism.

Anyways is there a way to guarantee/reserve resources for deployments in the kube-system namespace? (Or is this a bug in kubernetes or azure?, because it seems like that should be default behavior to give preference to deployments in kube-system namespace)

Side note:

I did tell the overload.yaml deployment to scale from 300 instances to 1 instance, but the kubernetes system resources deployments availability isn't restored.
I tired kubectl delete pods --all --namespace=kube-system
to force the kube-system deployment's to redeploy the system pods, that doesn't help either.

Waiting 1 hour for azure to detect the nodes are failing healthchecks, and then reprovisioning is a terrible solution. I'd rather prevent it from happening in the first place by a method to guarantee/reserver resources for kube-system. But I'd also be curious to know if anyone knows an alternate way to force redeploy pods beyond deleting pods of a deployment.

alexander.polomodov
  • 1,060
  • 3
  • 10
  • 14
neokyle
  • 103
  • 8

2 Answers2

0

You can stipulate resource requests and limits (Memory & CPU) in a yaml/manifest file for a deployment. So I wonder if you could not do that for the kube-system pods. When you set these values scale operations like the one you did will be prevented/fail if you don't have enough availability.

https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-requests-and-limits-of-pod-and-container

KWilson
  • 116
  • 2
  • I think this is what the problem was my overload replica set didn't have resource requests or limits. If I'd set limits in the first place it would have never been scheduled and then never tried to push out the kube-system processes. Also as a side note we're realizing that AKS (and Azure in general) sucks in several ways and we're implementing using Rancher on Azure VMs. – neokyle Jul 23 '18 at 02:21
0

it's depend about how you setup your cluster, but If you used kubeadm or kops in the namespace kube-system you have kubernetes system pods, many of this pods running on master and on master by default you don't have schedule pods.

Don't touch staff in namespace kube-system, if you need to deploy an application, try to create a new one.

c4f4t0r
  • 5,149
  • 3
  • 28
  • 41
  • AKS is Azure's K8s as a service so you're only given worker nodes, (MS Manages Master and ectd nodes for you) these were kube-system processes on the worker nodes. I never touched them, but when I overloaded the worker node, they went down. – neokyle Jul 23 '18 at 02:24