I noticed that one of AKS services is in the failed state. When I went to diagnostics, I found out that current version is not supported anymore. So I tried to follow instructions stated here: https://docs.microsoft.com/en-us/azure/aks/upgrade-cluster
I ran first the command:
az aks get-upgrades --resource-group myResourceGroup --name myAKSCluster --output table
and then:
az aks upgrade --resource-group myResourceGroup --name myAKSCluster --kubernetes-version new_version
and that would produce an error:
Operation failed with status: 'Conflict'. Details: Upgrades are disallowed while cluster is in a failed state. For resolution steps visit https://aka.ms/aks-cluster-failed to troubleshoot why the cluster state may have failed and steps to fix cluster state.
So, state was failed due to old version, and version could not be updated due to failed state...
I checked this https://stackoverflow.com/questions/54631309/this-container-service-is-in-a-failed-state but that was not our problem, we had plenty of resources to go around (which we checked with az aks show --resource-group myResourceGroup --name myAKSCluster --query agentPoolProfiles
)
Deleting and recreating AKS is not an option.