debug kubelet not starting

Question

I'm using kubeadm to try to setup a dev master. I'm running into an issue where the healthcheck for kubelet is failing. I'm looking for direction on how to debug that. Running the command that's suggested for debugging (systemctl status kubelet) don't see the cause of the error:

kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2017-10-05 15:04:23 CDT; 4s ago
     Docs: http://kubernetes.io/docs/
  Process: 4786 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_CERTIFICATE_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
 Main PID: 4786 (code=exited, status=1/FAILURE)

Oct 05 15:04:23 master.domain..com systemd[1]: Unit kubelet.service entered failed state.
Oct 05 15:04:23 master.domain.com systemd[1]: kubelet.service failed.

Where can I find a specific error message to indicate why this isn't running?

After running swapoff -a to disable swap, I'm still not able to provision Kubernetes.

Here's the full output from kubeadm init:

$ kubeadm init
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.2
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] Starting the kubelet service
[kubeadm] WARNING: starting in 1.8, tokens expire after 24 hours by default (if you require a non-expiring token use --token-ttl 0)
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [master.my-domain.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.xx.xx.xx 10.xx.xx.xx]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp 127.0.0.1:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp 127.0.0.1:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by that:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    - There is no internet connection; so the kubelet can't pull the following control plane images:
        - gcr.io/google_containers/kube-apiserver-amd64:v1.8.2
        - gcr.io/google_containers/kube-controller-manager-amd64:v1.8.2
        - gcr.io/google_containers/kube-scheduler-amd64:v1.8.2

You can troubleshoot this for example with the following commands if you're on a systemd-powered system:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster

I've also tried removing the docker repository and installing Docker 1.12 which isn't runnable - Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux ...

see if there are logs for it in /var/log or /var/log/kubernetes not sure where kubeadm puts things — Mike, Oct 05 '17 at 21:03
@Mike I can't find logs anywhere, I think it's outputting everything rather than logging. — Ben, Oct 06 '17 at 14:34
best thing to do is look at the systemd service file.. source the env vars and run the command to see the output — Mike, Oct 06 '17 at 15:29
`/etc/systemd/system/kubelet.service` looks like just a yml file — Ben, Oct 24 '17 at 21:00

score 4 · Answer 1 · edited May 28 '19 at 11:53

4

Things were solved by setting --fail-swap-on=false in the systemd script. Just make the modification on the file /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false"

then run systemctl daemon-reload and then systemctl restart kubelet

edited May 28 '19 at 11:53

Community

1

answered Oct 24 '17 at 17:46

dirtbag

91
1

This has not resolved the issue for me. – Ben Oct 24 '17 at 20:58
I am trying to do it but I am getting with the user. It said the user is not running as root. – Robert Apr 13 '18 at 17:46

score 4 · Answer 2 · answered Oct 25 '17 at 11:36

Found an issue regarding this: https://github.com/kubernetes/kubernetes/issues/53333

Following the previous answer worked for me, but not the resolution offered in the linked issue.

So perhaps, following their suggestion of editing 90-kubeadm.conf (in place of 10-kubeadm.conf) would work

Alexred · Answer 3 · 2019-09-03T04:38:41.707

I have had the same issue but on Fedora 30, kubelet 1.15.3, docker-ce 19.03.1 And the output of systemctl status kubelet was contained same as in your case:

Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf

Steps to solve were: 1. check if you have files kubelet.service and 10-kubeadm.conf on the next paths:

ls /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
ls /usr/lib/systemd/system/kubelet.service

10-kubeadm.conf:

more /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf 

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

kubelet.service:

more /usr/lib/systemd/system/kubelet.service

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
EnvironmentFile=-/etc/kubernetes/config
EnvironmentFile=-/etc/kubernetes/kubelet
ExecStart=/usr/bin/kubelet \
        $KUBE_LOGTOSTDERR \
        $KUBE_LOG_LEVEL \
        $KUBELET_API_SERVER \
        $KUBELET_ADDRESS \
        $KUBELET_PORT \
        $KUBELET_HOSTNAME \
        $KUBE_ALLOW_PRIV \
        $KUBELET_ARGS
Restart=on-failure
KillMode=process

[Install]
WantedBy=multi-user.target

Delete systemd unit for kubelet in /etc/systemd/system/

rm -R /etc/systemd/system/kubelet.service.d (confirm "y" for each file)
rm /etc/systemd/system/kubelet.service

Reload all systemd unit files, and recreate the entire dependency tree.

systemctl daemon-reload
restart kubelet

systemctl restart kubelet

The output of kubelet status then should contain:

  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf

Initialize a Kubernetes control-plane node:

kubeadm reset
systemctl daemon-reload
kubeadm init --pod-network-cidr=10.244.0.0/16

Note: You have one or more issue:

`- There is no internet connection; so the kubelet can't pull the following control plane images:`

Try to pull them manually:

kubeadm config images pull

You may need to upgrade kubeadm, kubelet, kebectl

score 1 · Answer 4 · answered Oct 25 '17 at 20:45

1

This is all already covered in the issue that Atom posted, so I don't feel like I'm contributing an awful lot, but I can replicate your issue if swap is turned on. So for me the solution is to disable swap and retry the init:

sudo -i
swapoff -a
kubeadm reset
kubeadm init

The answer posted by dirtbag worked for me as well, but just to be safe after systemctl daemon-reload, I did a full kubeadm reset and kubeadm init, not just a systemctl restart kubelet.

If this doesn't work for you, can you please paste the new output of kubeadm init after disabling swap?

answered Oct 25 '17 at 20:45

Pawilon

71
5

That does allow me to remove the flag from `kubeadm.conf` and makes the warning `[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03` appear. I also tried running `kube init --pod-network-cidr=10.244.0.0/16` as described in the flannel instructions with no luck. – Ben Oct 25 '17 at 20:56
New init output has been posted above. – Ben Oct 25 '17 at 21:24
Looking at https://kubernetes.io/docs/setup/independent/install-kubeadm/#installing-docker they do say: `On each of your machines, install Docker. Version v1.12 is recommended, but v1.11, v1.13 and 17.03 are known to work as well. Versions 17.06+ might work, but have not yet been tested and verified by the Kubernetes node team.` Perhaps it's worth retrying with a supported Docker version? I used `1.12.6` for my test. This was the default version I got on a fresh Centos7 after doing `yum install epel-release` and `yum install docker`. – Pawilon Oct 25 '17 at 21:31
I just did that, docker 1.12 won't run `Error starting daemon: SELinux is not supported with the overlay graph driver on this kernel. Either boot into a newer kernel or disable selinux`... – Ben Oct 26 '17 at 13:16
Disabling SELinux was also encouraged by the kubeadm documentation. To disable selinux until reboot run `setenforce 0`. To disable permanently, edit `/etc/selinux/config` and change `SELINUX=enforcing` to `SELINUX=permissive`. I assume this is a development environment and disabling SELinux isn't too big of an issue! – Pawilon Oct 26 '17 at 13:22
It's currently set to permissive. I ran `setenforce 0` as well. I'm still in the same spot. – Ben Oct 26 '17 at 17:20
Anymore suggestions? – Ben Oct 30 '17 at 17:41

srikanth · Answer 5 · 2017-12-16T07:39:37.713

1

check the error from: journalctl -xeu kubelet

Note: Make sure that the cgroup driver used by kubelet is the same as the one used by Docker. To ensure compatability you can either update Docker, like so:

cat << EOF > /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

edited Dec 16 '17 at 07:39

answered Dec 16 '17 at 07:26

srikanth

11
2

ant_dev · Answer 6 · 2022-02-03T14:33:28.327

0

sudo vim /etc/modules-load.d/k8s.conf
Make sure to have these settings there:

net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1

edited Feb 03 '22 at 14:33

answered Feb 03 '22 at 14:25

ant_dev

101
1

score 0 · Answer 7 · answered Mar 24 '22 at 08:29

first need to create conf file https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

then enable kubelet

we need to run kubeadm init on master or kubeadm reset on worker nodes to generate conf files

umesh torawane · Answer 8 · 2022-05-03T14:15:23.940

0

Make sure swap space is off.

#swapoff -a

manual way -

comment out any line with 'swap' word in /etc/fstab.

after doing it manually run swapoff -a command once again.

then restart kubelet service - #systemctl restart kubelet

edited May 03 '22 at 14:15

answered Apr 29 '22 at 15:16

umesh torawane

111
2

debug kubelet not starting

8 Answers8

Linked