Ok so at work we were planning to scale down the number of nodes in Azure Kubernetes service. Before doing this I wanted to see what would happen if I overloaded the nodes on a test cluster.
On a 3 node test cluster I wrote a overload.yaml which spawned 200 wordpress pods
kubectl apply -f overload.yaml kubectl get deployments
--all-namespaces=true
This shows everything looks good, Azure's web portal showed only 30% cpu and ram usage. (It said 200 wordpress pods desired, 200 wordpress pods available, and it showed 8 pods from the kube-system namespace, and showed them all as available)
All good so I bumped it up to 300 wordpress replicas.
now kubectl get deployments --all-namespaces=true
shows 300 wordpress pods desired, 105 wordpress pods available. It showed 0 of 8 kube-system
deployments available, later only 2 of 8 restarted, which seems like a really bad thing,
Azure's web portal showed 2 nodes were unavailable. az aks browse stopped working kubectl get pods --namespace=kube-system
shows status nodelost, unknown, pending, and only 2 running that successfully autohealed.
~An hour later the Azure nodes were replaced based on uptime listed in the Azure web portal. I think they went down only because the kube-system pods went down, which I'm guessing caused them to fail a health check and triggered some auto recovery mechanism.
Anyways is there a way to guarantee/reserve resources for deployments in the kube-system namespace? (Or is this a bug in kubernetes or azure?, because it seems like that should be default behavior to give preference to deployments in kube-system namespace)
Side note:
I did tell the overload.yaml
deployment to scale from 300 instances to 1 instance, but the kubernetes system resources deployments availability isn't restored.
I tired kubectl delete pods --all --namespace=kube-system
to force the kube-system deployment's to redeploy the system pods, that doesn't help either.
Waiting 1 hour for azure to detect the nodes are failing healthchecks, and then reprovisioning is a terrible solution. I'd rather prevent it from happening in the first place by a method to guarantee/reserver resources for kube-system. But I'd also be curious to know if anyone knows an alternate way to force redeploy pods beyond deleting pods of a deployment.