1

I have a running cluster with version as below GitVersion:"v1.24.2",

what is the solution for the CNI changes in this version for running clusters that are throwing errors for --network-plugin flags? i cant seem to find any way to fix the running clusters. 2 of my clusters are apparently down due to this and cant seem to figure out how to fix

I have tried changing the file /var/lib/kubelet/kubeadm-flags.env to below and still doesnt help

KUBELET_KUBEADM_ARGS="--pod-infra-container-image=k8s.gcr.io/pause:3.5"

KUBELET_NETWORK_ARGS=''

Related to the issue https://github.com/kubernetes/website/issues/33640

Documentation is updated and merged but what about running clusters? what can be done?

Additional Details

OK maybe i was not clear with my questions / explanation earlier

We had an earlier verion of kubernetes which got upgraded to 1.24.2 and we see the same behaviour across 2 clusters- when i say clusters these are for now 2 virtual machines each behaving as a cluster of its own, hosted on-premise. We deploy containers that connect to Azure - as Self-hosted App gateways On Premise.

Issue - Post upgrading to the current version I see the below errors in kubelet logs, and kubelet doesnt seem to be running / active

kubectl get pods The connection to the server localhost:8080 was refused - did you specify the right host or port?

Errors in Kubelet logs

kubelet[18280]: Error: failed to parse kubelet flag: unknown flag: --network-plugin systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE

I tried to adjust the parameter from /var/lib/kubelet/kubeadm-flags.env by removing the --network-plugin flag , but then no luck. I now see the below error as well in the kubelet logs after the kubelet restart

{ 0 }. Err: connection error: desc = "transport: Error while dialing dial unix: missing address". Reconnecting... Jul 01 15:14:49 kubelet[10297]: Error: failed to run Kubelet: unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix: missing address"

kubelet[10252]: --runtime-request-timeout duration Timeout of all runtime requests except long running request - pull, logs, exec and attach. When timeout exceeded, kubelet will cancel the request, throw out an error and retry later. (default 2m0s) (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

The git-Hub link talks about the same issue but then more on the documentation change on the flags which are no more used. Is there any fix available for running clusters who has already upgraded to the latest version?

ramakrpr
  • 11
  • 3
  • Hi ramakrpr welcome to S.F. It "doesn't help" but instead ... **does what**? We're not at your computer to see what you see, so if you want help the burden lies with the question to provide enough context to know what you've already tried and what the outcome is. Good luck – mdaniel Jul 01 '22 at 17:02
  • Hi mdaniel - i have updated Additional Comments to the initial question above. Pls let me know if more details are required. – ramakrpr Jul 04 '22 at 18:08
  • `systemctl cat kubelet.service`? `ps auwwwx | grep kubelet`? You seem to be asking where that command line flag is coming from but we're not at your computer to know – mdaniel Jul 04 '22 at 21:49
  • i am sorry, i am not aksing you on where the command line flag is coming from. I am asking you / anyone if they had experienced with the same issue with v 1.24.2- as i see in the git hub thread that others also who experience the same and how this can be fixed. i hope there is someone sensible to answer me...i will be more than happy if you can ask me what you need me to provide as inputs....instead of commeting that we're not at your computer to know – ramakrpr Jul 05 '22 at 10:23

1 Answers1

4

I just encountered exactly the same issue while upgrading a control-plane/master node from 1.23.9-00 to 1.24.3-00. The control plane was created using kubeadm.

Kubelet wouldn't start, first error was:

Error: failed to parse kubelet flag: unknown flag: --network-plugin

Some suggested removing the --network-plugin flag from /var/lib/kubelet/kubeadm-flags.env, but this just caused a different error:

grpc: addrConn.createTransport failed to connect to {  <nil> 0 <nil>}.
Err: connection error: desc = "transport: Error while dialing dial unix:
missing address". Reconnecting...
Error: failed to run Kubelet: unable to determine runtime API version:
rpc error: code = Unavailable desc = connection error: desc =
"transport: Error while dialing dial unix: missing address"

Through some old github issue I found there is a proper way to regenerate the kubeadm-flags.env file:

The workaround that we implemented for now is to run kubeadm init phase kubelet-start prior to running the kubeadm upgrade command

So the first part of the solution was to run this to regenerate this file:

# kubeadm init phase kubelet-start

This might already fix it for you, but for me kubelet would still not start:

Error: failed to run Kubelet: failed to create kubelet: get remote runtime typed
version failed: rpc error: code = Unimplemented desc = unknown 
service runtime.v1alpha2.RuntimeService

I found that this issue was caused by an invalid cri/containerd configuration. Running crictl ps showed similar issues.

Someone else posted a solution to that problem in this thread:

# containerd config default > /etc/containerd/config.toml
# echo """
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
""" > /etc/crictl.yaml
# systemctl restart containerd

This solved the issue for me. Unfortunately I don't really know what caused this problem to begin with. I followed the upgrade instructions.

mattzq
  • 41
  • 3
  • 1
    i see many facing the same issue. However i just downgraded the version for now as i couldnt make it work with the latest and I wanted my clusters to be up ASAP. It is weird that the community is not taking this seriously and when an issue is raised people ask freaking questions (like the one above to my initial thread) rather than giving prompt response. – ramakrpr Jul 19 '22 at 04:57