2

I have an EKS Cluster (AWS) named cluster-main running on

  • Kubernetes version: 1.16
  • Platform version: eks.4
  • CNI version v1.6.1

There are two node groups in the cluster

Cluster Name Instance Type AMI Type
generic-node-group t3a.medium AL2_x86_64
memory-node-group r5a.large AL2_x86_64

The nodes in these groups work fine.

I am trying to add a new node group that consists of ARM instances

Cluster Name Instance Type AMI Type
cpu-node-group c6g.xlarge AL2_ARM_64

However, the nodes of this group are stuck in Not Ready status and the node group fails to get created due to the issue below

Conditions:

Type Status LastHeartbeatTime LastTransitionTime Reason Message
Ready False Mon, 31 May 2021 08:40:22 -0400 Mon, 31 May 2021 08:38:21 -0400 KubeletNotReady runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
  • All node groups have Node IAM Role ARN
  • All node groups are AWS-managed groups.
  • All node groups are deployed under two specific subnets (private)

When I SSH into the EC2 instance I get the following logs under /var/log/message

1430 cni.go:237] Unable to update cni config: no networks found in /etc/cni/net.d
1430 kubelet.go:2193] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

I've confirmed that the /etc/cni/net.d directory is indeed empty

I have another EKS cluster with similar characteristics where the ARM node group is initialized without any issue. However, I have found two differences. The test cluster uses:

  • Platform version: eks.5
  • CNI version 1.7.5
    • amazon-k8s-cni-init:v1.7.5-eksbuild.1
    • amazon-k8s-cni:v1.7.5-eksbuild.1

Any ideas?

  • 1
    Seems like a lot of people encountered this issue. Can you try solution described [here](https://github.com/aws/amazon-vpc-cni-k8s/issues/284#issuecomment-601987503)? – acid_fuji Jun 01 '21 at 07:10
  • 1
    Hi @thomas thanks for the reply. It seems that the add-ons were indeed never updated. For context, the cluster was initially created at 1.14 version and was later upgraded to 1.16. I went ahead and upgraded the listed addons but the issue remains. One thing I noticed was that the CNI version remained at v1.6.1. The other cluster that I have in place - which was initialized much later but with the same kubernetes version (1.16) uses CNI v1.7.5. Any thoughts? Any chance I missed something? – argyrodagdileli Jun 01 '21 at 09:32
  • 1
    Hi @thomas it looks like the issue was related to where you pointed me at. Looks like the cluster was never properly upgraded. I had to follow some additional steps - more specifically manually upgrading the CNI addon to 1.7 and then modifying the `kube-proxy` daemon set configuration to support the `arm64` architecture. Thank you very much for your help. Very much apprciated! – argyrodagdileli Jun 01 '21 at 11:02

1 Answers1

2

Ok - as @thomas suggested the issue was related to the EKS addons.

For context and as I said in my comment, the cluster was initially created at 1.14 version and was later upgraded to 1.16.

However, the aws-node, kube-proxy, and coredns add-ons were never upgraded. Followed the instructions here but the issue remained.

What I did notice though was that the aws-node was still using the same CNI image (v1.6.3)

kubectl describe daemonset aws-node --namespace kube-system | grep Image | cut -d "/" -f 2

After further investigation I had to manually upgrade the CNI version following the instructions here

Lastly, I noticed that an aws-node pod was created for my arm64 node - which previously it didn't. However, the liveness probe for the pod was failing and the node was still stuck in NotReady status. So I had to edit the configuration for the kube-proxy daemon set as described in step (3) of this guide.