1

the issue started out of nowhere without any apparent change that we did. All of a sudden, deployments to our AKS (Kubernetes Version 1.23.8) clusters would randomly fail with errors like the below. To my undersatnding this means that the images which should be deployed can't be found in our azure container registry - even though they do exist there.

failed to do request: Head "https://XXX.azurecr.io/v2/XXX/manifests/5510": 
dial tcp: lookup acrobherazueuw1bscdev001.azurecr.io on [::1]:53: 
read udp [::1]:40545->[::1]:53: read: connection refused, rpc error: 
code = Unknown desc = failed to pull and unpack image "XXX.azurecr.io/XXX:5510": 
failed to resolve reference "XXX.azurecr.io/XXX:5510": failed to do request: 
Head "https://XXX.azurecr.io/v2/XXX/manifests/5510": dial tcp: 
lookup XXX.azurecr.io on [::1]:53: read udp [::1]:52124->[::1]:53: read: connection refused]

It's certainly not a firewall issue, and I am 100% confident that the image exists. Even funnier, there's a 20% chance that any random image will actually work and deploy. however, it's also possible that when deployed with two replicas one of them will deploy, the other one will generate that error. Deploying an image that has already been successful deployed previousely will not guarantee a successful deployment.

At this point I am really stuck - I am not finding any good help online for this particular problem. Has anyone else faced that issue before?

Best, Matthias

1 Answers1

0

service disruption in azure - k8s nodes that had automatic node updates installed got a version of ubuntu that had issues with DNS resolution. hence the dns name for the container registry could not have been resolved.

Solution was to manually roll back the nodes to the previous ubuntu version