1

I noticed that one of AKS services is in the failed state. When I went to diagnostics, I found out that current version is not supported anymore. So I tried to follow instructions stated here: https://docs.microsoft.com/en-us/azure/aks/upgrade-cluster

I ran first the command:

az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster --output table

and then:

az aks upgrade --resource-group myResourceGroup --name myAKSCluster --kubernetes-version new_version

and that would produce an error:

Operation failed with status: 'Conflict'. Details: Upgrades are disallowed while cluster is in a failed state. For resolution steps visit https://aka.ms/aks-cluster-failed to troubleshoot why the cluster state may have failed and steps to fix cluster state.

So, state was failed due to old version, and version could not be updated due to failed state... I checked this https://stackoverflow.com/questions/54631309/this-container-service-is-in-a-failed-state but that was not our problem, we had plenty of resources to go around (which we checked with az aks show --resource-group myResourceGroup --name myAKSCluster --query agentPoolProfiles)

Deleting and recreating AKS is not an option.

1 Answers1

5

So after hours of trying different solutions and failing, I found fix for this among the answers here: https://github.com/Azure/AKS/issues/542

In order to fix failed state because of outdated version, I had to simply do the following:

Upgrade aks to version that is already there. So my version was 1.14.8 and I simply ran:

az aks upgrade  --resource-group myResourceGroup  --name myAKSCluster --kubernetes-version 1.14.8

which fixed the failed state of the cluster!

After this I just ran upgrade to the correct next version (1.18.19 in my case):

az aks upgrade  --resource-group myResourceGroup  --name myAKSCluster --kubernetes-version 1.18.19

I hope that this will save someone hours of frustrations :)