1

I am new to the Azure AKS Cluster world, and while messing with a test cluster i have deleted all its Nodes with kubectl delete node xxxx, thinking that the cluster will heal itself. Boy, was i wrong.

Now, let me explain the issue, so, when i run kubectl get nodes, i get No resources found. In the "Node Pools" in the portal, i can see that there are 3 Nodes, i have scaled the Pool up and down, but in kubectl shows no nodes - No resources found. When i run kubectl get pods, all the pods are shown in pending state.

Extra Info:

  • The AKS Cluster was created manually, no ARM template or script was saved.
  • The AKS Cluster is using Availability Set (not Scale Set) for the Pool, so i can not add new Pool, and move the Pods there.

My question(s) to you is:

  1. How to get the Nodes to be shown in kubectl again? (The Pool has 3 Nodes there sitting)
  2. Can i somehow restore the Cluster to be working again? Move the Pods somehow, somewhere?
  3. What would you do in this case?

EDIT:

  • after some time showing "No Resources found" when i ran "kubectl get nodes", now 2 nodes came back online, but one is still missing. The Pool has count of 3. The 2 Nodes which are shown are in Ready State. But all the Pods are still in Pending state. No errors in Events.

New Question:

  • Is there a way to start populating the 2 Ready Nodes with the Pending Pods?

Thanks again folks.

  • can you execute "kubectl describe pod " with any one of your pods and share the event logs? – Chayan Bansal Jun 03 '21 at 13:53
  • Hi @ChayanBansal there are no events on none of the Pods: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: – bumbo-jumbo Jun 03 '21 at 13:59
  • 1
    Then can you try and describe all the nodes one by one? Check the information provided there – Chayan Bansal Jun 03 '21 at 14:15
  • 1
    Hi Chayan, thanks for your Input. The description of the nodes and pods brought us nothing, all the events fields were empty. and the Events did not show nothing, everything was empty. The connection to the AKS API Server was lost, and the AKS Upgrade solved the problem. I am not sure why and how, the cluster was down, the pods in pending state, the Nodes were in Ready State, so we just ran az aks upgrade and all came back together after that and started working. I am not sure why.. But thank you for your input. You helped me a lot. Best Regards – bumbo-jumbo Jun 04 '21 at 09:36

2 Answers2

2

If you have run kubectl delete node, then the node is no longer registered with Kubernetes. If you were using scale set's then the best option would be to scale down and then back up again, to get new nodes and have them re-register. In your scenario with availability sets you don't have that option. You could look at running a node update, which may re-register it, or you can delete the VM and have AKS recreate it.

All of that said, availability sets are not the way to do AKS nowadays, if I were you I would just delete the cluster and recreate it using VMSS, given this is a test cluster.

Sam Cogan
  • 38,158
  • 6
  • 77
  • 113
  • Hi Sam, you are right. We will reconsider recreating the cluster according to the latest best practices. In my case, the AKS Upgrade solved the issue. All the nodes came back online, the connectivity to the cluster was restored, and the pods went from Pending to a running state. Thanks again for your input. Best regards – bumbo-jumbo Jun 04 '21 at 09:33
2

Thank you all for helping here, so, we had a support session with MS Support Team, and as always the recommendation was, first upgrade the cluster to supported AKS Version, and then we can see what to do next. I ran az aks upgrade to the next supported version, and all the nodes redeployed themselves correctly, and the connectivity to the API server came back.The Pods started working fine, and the cluster was back online. So to be precise - the solution was to upgrade the Cluster to a supported AKS Cluster Version using CLI.

Thank again folks