1

I've got something wrong with my Kubernetes node, but it's hard to debug because I get pages and pages of stack trace like

Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*Reflector).Run(0xc000040f
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: created by k8s.io/kubernetes/pkg/kubelet/config.newSourceApiserverFromLW
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: goroutine 244 [select]:
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.contextForChannel.func
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: created by k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.contextForC
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: goroutine 205 [sync.Cond.Wait]:
Nov 13 10:55:51 corona kubelet[29656]: runtime.goparkunlock(...)
Nov 13 10:55:51 corona kubelet[29656]:         /usr/local/go/src/runtime/proc.go:312
Nov 13 10:55:51 corona kubelet[29656]: sync.runtime_notifyListWait(0xc000b140c8, 0x1)
Nov 13 10:55:51 corona kubelet[29656]:         /usr/local/go/src/runtime/sema.go:513 +0xf8
Nov 13 10:55:51 corona kubelet[29656]: sync.(*Cond).Wait(0xc000b140b8)
Nov 13 10:55:51 corona kubelet[29656]:         /usr/local/go/src/sync/cond.go:56 +0x9d
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc000b140
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache.(*controller).processLoop(0
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000def
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000defe
Nov 13 10:55:51 corona kubelet[29656]:         /workspace/anago-v1.19.4-rc.0.51+5f1e5cafd33a88/src/k8s.io/kubernetes/_ou
Nov 13 10:55:51 corona kubelet[29656]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)

I tried adding -v=0 to the kubelet command in the systemd unit file, but it's still spewing. I looked at /var/lib/kubelet/config.yaml and it says logging: {}. I'm not sure if there's something I could put in there to make it quieter.

Is there any way to tell kubelet to skip the stack trace? The error messages before the trace are helpful but hard to find in the noise.

  • 1
    That's actually not _logging_ it's `panic` output, AFAIK, and is very serious and should not be ignored as "normal logging, ho hum". Unfortunately, you appear to have elided the output so much we can't see what it is panicking about. So perhaps edit your question to include the whole output and we'll try to see why it's angry – mdaniel Nov 13 '20 at 16:53
  • It's 20+ pages. The very first line is a clear message like `failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"`. My problem is that I can't see the useful message because the panic output is so voluminous. – Nathaniel Waisbrot Nov 13 '20 at 17:37
  • 1
    Then have you considered fixing that error so the next most obvious error will be visible? – mdaniel Nov 15 '20 at 18:59

2 Answers2

2

First check it there aren’t any config files or environment variables that are used for running kubelet.

Best way to check:

1. check kubelet process (e.g. ps -ef | grep kubelet) to see all the args

2. if kubelet is using dynamic config check ConfigMap kubelet-config-1.18 in Kube-system to see if cgroupDriver is specified as you desire 3. if you run kubelet using systemd, then you could use the following method to see kubelet's logs:

# journalctl -u kubelet

See: system-component-logs.

See example how to narrow the scope of logs: managing-logs-kubernetes.

Error failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd" is due to the Cgroup driver used in the Kubelet. Either the Kubelet and Docker should run using Systemd driver or with the Cgroupfs. Systemd is recommended.

Check on the worker nodes file /var/lib/kubelet/kubeadm-flags.env and in KUBELET_KUBEADM_ARGS if you have --cgroup-driver=cgroupfsflag. Changed it to systemd and kubelet will start working again.

Take a look: kubelete-cgroupfs-driver, kubelet-cgroupfs.

See also: kubelet-runtime-panic.

Malgorzata
  • 358
  • 1
  • 5
  • Thanks for the nice explanation of how to fix that problem. I was asking about how to remove the stack-trace because I can fix the problems as soon as I can read them but it's extremely frustrating to search for the single line that tells me the problem. – Nathaniel Waisbrot Nov 16 '20 at 19:26
  • I have showed you how to make logs more clear and how to search for bugs using specific key words. At the same time you will decrease of output lines and get just information you want to get. I have also showed you possible solution of your problem. – Malgorzata Nov 17 '20 at 11:10
  • I appreciate that you're trying to help, but you and mdaniel are not answering what I'm asking and I don't know how to be more clear. I understand that what I'm asking will not "solve" my problem (I still won't have a working k8s). But you keep suggesting that I fix the errors and I keep trying to say that I can't _find_ the errors because of the size of the stack-trace. The only things that would help me would be no stack-trace (seems impossible) or maybe a good `grep` that would filter it out. I didn't want to ask about the cgroups problem because I already know how to fix it. – Nathaniel Waisbrot Nov 17 '20 at 14:19
0

The stack trace comes from code in kubelet calling panic, a Golang routine that dumps stack-traces from all threads and terminates the program with a non-success exit code.

There is no way to avoid the stack trace, other than to not call panic in the code. Normally, a panic would indicate that the program has gotten into an unexpected state and the stack-trace would help the programmers debug the problem.

In this case, at least some of the panics are coming from known-errors. The program was started with incorrect configuration. There is no way to shorten or avoid the stack-trace output. Probably, the kubelet code should be corrected to exit cleanly with an error instead of panicking for cases where the problem is clear.

From the operator's perspective, the only thing that can be done is to find the very first line of the panic, which typically contains a readable error message, and see if it's something that can be resolved.