1

Today one worker node of kubernetes cluster was in freeze at around 5:30-5:40 and I try to find out what the reason of it stucking, in order to this I look in /var/log/syslog. There are tons of log, but in interested time I see some strange logs. Can you explain what the meaning of it? Maybe you can suggest how can I find out the reason of system freeze.

Sep  3 05:25:53 worker-03 kubelet[1833]: I0903 05:25:53.551376    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istio-ingressgateway-559c766855-zkbs7" podUID=306e640f-015a-4785-8aa7-11f67c663d76 containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.246:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:53 worker-03 kubelet[1833]: I0903 05:25:53.573309    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="argocd/argocd-repo-server-756868b744-q9dcr" podUID=4610633c-2b97-4004-8d3d-83c3b6ee7b16 containerName="argocd-repo-server" probeResult=failure output="Get \"http://10.233.107.114:8084/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:53 worker-03 kubelet[1833]: I0903 05:25:53.651938    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="kube-system/nodelocaldns-njhc8" podUID=48be74df-a476-49fc-8284-5831e52da498 containerName="node-cache" probeResult=failure output="Get \"http://169.254.25.10:9254/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:54 worker-03 kubelet[1833]: I0903 05:25:54.215897    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="loki/loki-loki-distributed-querier-0" podUID=6dc0b6af-86ee-4095-b2ae-2da85350a42a containerName="querier" probeResult=failure output="Get \"http://10.233.107.16:3100/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:54 worker-03 kubelet[1833]: I0903 05:25:54.409249    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="sks-cloud-provider-api/sks-cloud-provider-api-b848cbc54-dx767" podUID=b7264b92-fa60-49d5-891b-308dcee0e73c containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.187:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:54 worker-03 kubelet[1833]: I0903 05:25:54.897313    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istio-ingressgateway-559c766855-zkbs7" podUID=306e640f-015a-4785-8aa7-11f67c663d76 containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.246:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:54 worker-03 kubelet[1833]: I0903 05:25:54.948430    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="loki/loki-loki-distributed-distributor-56b4887584-q9v9r" podUID=2b7004e5-f1d0-4e56-b3da-66d67289bec8 containerName="distributor" probeResult=failure output="Get \"http://10.233.107.136:3100/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:55 worker-03 kubelet[1833]: I0903 05:25:55.208913    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="monitoring/prometheus-monitoring-prometheus-0" podUID=324b41cb-3fe7-4cf7-8538-5204c5796113 containerName="prometheus" probeResult=failure output="Get \"http://10.233.107.196:9090/-/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:55 worker-03 kubelet[1833]: I0903 05:25:55.411728    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="monitoring/prometheus-kube-state-metrics-54fb5f89bc-z4k6n" podUID=96e86d2b-8f5c-43d5-911a-4cce7432321c containerName="kube-state-metrics" probeResult=failure output="Get \"http://10.233.107.139:8080/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:55 worker-03 kubelet[1833]: I0903 05:25:55.412000    1833 prober.go:116] "Probe failed" probeType="Liveness" pod="monitoring/prometheus-kube-state-metrics-54fb5f89bc-z4k6n" podUID=96e86d2b-8f5c-43d5-911a-4cce7432321c containerName="kube-state-metrics" probeResult=failure output="Get \"http://10.233.107.139:8080/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:55 worker-03 kubelet[1833]: I0903 05:25:55.427840    1833 prober.go:116] "Probe failed" probeType="Liveness" pod="monitoring/prometheus-prometheus-node-exporter-bnkxh" podUID=06a527f3-da1b-41b7-8aa8-996d73742c7e containerName="node-exporter" probeResult=failure output="Get \"http://192.168.17.93:9100/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:55 worker-03 kubelet[1833]: I0903 05:25:55.701902    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istiod-7cf55d45b-6fgk5" podUID=8e2bf0dd-cbe3-4595-b470-1d621f2b9281 containerName="discovery" probeResult=failure output="Get \"http://10.233.107.160:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:56 worker-03 kubelet[1833]: I0903 05:25:56.078757    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="cert-manager/cert-manager-webhook-7c9588c76-kf8th" podUID=2769d532-0618-46ec-b5b1-2336b6757ae0 containerName="cert-manager" probeResult=failure output="Get \"http://10.233.107.142:6080/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:56 worker-03 kubelet[1833]: I0903 05:25:56.252254    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="logging/fluent-bit-frrdn" podUID=2a2b69e3-ff48-4546-9441-736e5f7a758b containerName="fluent-bit" probeResult=failure output="Get \"http://10.233.107.11:2020/api/v1/health\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:56 worker-03 kubelet[1833]: I0903 05:25:56.380214    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="monitoring/prometheus-prometheus-node-exporter-bnkxh" podUID=06a527f3-da1b-41b7-8aa8-996d73742c7e containerName="node-exporter" probeResult=failure output="Get \"http://192.168.17.93:9100/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:56 worker-03 kubelet[1833]: I0903 05:25:56.618220    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istio-ingressgateway-559c766855-zkbs7" podUID=306e640f-015a-4785-8aa7-11f67c663d76 containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.246:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:57 worker-03 kubelet[1833]: I0903 05:25:57.420388    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="sks-cloud-provider-api/sks-cloud-provider-api-b848cbc54-dx767" podUID=b7264b92-fa60-49d5-891b-308dcee0e73c containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.187:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:57 worker-03 kubelet[1833]: I0903 05:25:57.612861    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="monitoring/alertmanager-0" podUID=0891f505-c61a-4fd5-be86-40ddee255aa5 containerName="alertmanager" probeResult=failure output="Get \"http://10.233.107.206:9093/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:58 worker-03 kubelet[1833]: I0903 05:25:57.819849    1833 prober.go:116] "Probe failed" probeType="Liveness" pod="argocd/argocd-application-controller-0" podUID=4902436f-bf8e-45b3-a19b-27becf7a9a0b containerName="argocd-application-controller" probeResult=failure output="Get \"http://10.233.107.239:8082/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:25:58 worker-03 kubelet[1833]: I0903 05:25:58.634776    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istio-ingressgateway-559c766855-zkbs7" podUID=306e640f-015a-4785-8aa7-11f67c663d76 containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.246:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:00 worker-03 kubelet[1833]: I0903 05:26:00.222531    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="monitoring/prometheus-monitoring-prometheus-0" podUID=324b41cb-3fe7-4cf7-8538-5204c5796113 containerName="prometheus" probeResult=failure output="Get \"http://10.233.107.196:9090/-/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:00 worker-03 kubelet[1833]: I0903 05:26:00.509926    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="sks-cloud-provider-api/sks-cloud-provider-api-b848cbc54-dx767" podUID=b7264b92-fa60-49d5-891b-308dcee0e73c containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.187:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:00 worker-03 kubelet[1833]: I0903 05:26:00.702291    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istiod-7cf55d45b-6fgk5" podUID=8e2bf0dd-cbe3-4595-b470-1d621f2b9281 containerName="discovery" probeResult=failure output="Get \"http://10.233.107.160:8080/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:01 worker-03 kubelet[1833]: I0903 05:26:00.704209    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="istio-system/istio-ingressgateway-559c766855-zkbs7" podUID=306e640f-015a-4785-8aa7-11f67c663d76 containerName="istio-proxy" probeResult=failure output="Get \"http://10.233.107.246:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:01 worker-03 kubelet[1833]: I0903 05:26:00.710682    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="argocd/argocd-repo-server-756868b744-q9dcr" podUID=4610633c-2b97-4004-8d3d-83c3b6ee7b16 containerName="argocd-repo-server" probeResult=failure output="Get \"http://10.233.107.114:8084/healthz\": dial tcp 10.233.107.114:8084: connect: connection refused"
Sep  3 05:26:01 worker-03 kubelet[1833]: I0903 05:26:00.729173    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="loki/loki-loki-distributed-compactor-74999bd956-267x5" podUID=02b883c6-2b1d-4e96-8881-6803fec489a8 containerName="compactor" probeResult=failure output="Get \"http://10.233.107.91:3100/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:01 worker-03 kubelet[1833]: I0903 05:26:00.749504    1833 prober.go:116] "Probe failed" probeType="Readiness" pod="cert-manager/cert-manager-webhook-7c9588c76-kf8th" podUID=2769d532-0618-46ec-b5b1-2336b6757ae0 containerName="cert-manager" probeResult=failure output="Get \"http://10.233.107.142:6080/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
    Sep  3 05:26:01 worker-03 kubelet[1833]: I0903 05:26:00.749542    1833 prober.go:116] "Probe failed" probeType="Liveness" pod="cert-manager/cert-manager-webhook-7c9588c76-kf8th" podUID=2769d532-0618-46ec-b5b1-2336b6757ae0 containerName="cert-manager" probeResult=failure output="Get \"http://10.233.107.142:6080/livez\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.283215753Z" level=error msg="collecting metrics for 390b4c1282346dac6e1903e53441d2ab470bf108096fc892f1bd302e3df1d435" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.283752136Z" level=error msg="collecting metrics for 914e315e0de49aa7dd261da226b48b6a4cf385f08facc436c1fbfeaa1ef863a2" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.283905042Z" level=error msg="collecting metrics for 9f81489623f1af8a81bd860bf78b70930e7e0f3633d306e53ed6b837c3ddcf22" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.284500855Z" level=error msg="collecting metrics for ddb7db5c372c09d0b9e6d6b27a0ce8ff7382cb1d72c05d6ed36e697ca7b12b9b" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.323673160Z" level=error msg="collecting metrics for 74b7de4061434af1a6fa82aaa45d7dc75d42b81b149ed3e2d4f839017742c6bc" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.349118249Z" level=error msg="collecting metrics for 2c24baa4aee760a4b2ff363411a920f1d950094677ae78276f4bf8299916d26e" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 systemd[1]: cri-containerd-315a461fced4861d35cc77016f5eef459606103ec44c9438abfc71bdab647c03.scope: Consumed 9min 29.713s CPU time
Sep  3 05:26:04 worker-03 kubelet[1833]: W0903 05:26:04.378893    1833 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease
Sep  3 05:26:04 worker-03 kubelet[1833]: W0903 05:26:04.379072    1833 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease
Sep  3 05:26:04 worker-03 kubelet[1833]: W0903 05:26:04.379123    1833 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease
Sep  3 05:26:04 worker-03 kubelet[1833]: W0903 05:26:04.379142    1833 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.391573423Z" level=error msg="copy shim log" error="read /proc/self/fd/223: file already closed"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.611114947Z" level=error msg="collecting metrics for 9f81489623f1af8a81bd860bf78b70930e7e0f3633d306e53ed6b837c3ddcf22" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.611304612Z" level=error msg="collecting metrics for ddb7db5c372c09d0b9e6d6b27a0ce8ff7382cb1d72c05d6ed36e697ca7b12b9b" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.612981181Z" level=error msg="collecting metrics for 315a461fced4861d35cc77016f5eef459606103ec44c9438abfc71bdab647c03" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.614901276Z" level=error msg="collecting metrics for d94789f7c9bb98a7990a9a01cdae70bd95cddb4acd36a2e191b95ea84110108f" error="cgroups: cgroup deleted: unknown"
Sep  3 05:26:04 worker-03 containerd[1101]: time="2022-09-03T05:26:04.623184952Z" level=error msg="collecting metrics for 396816a2695d1c1e59fb144db0ece211e249e8d43628bea4b317af65bb286b92" error="cgroups: cgroup deleted: unknown"
asktyagi
  • 2,401
  • 1
  • 5
  • 19
VladF
  • 11
  • 1
  • It just mean every probe is failing, cgroup unidentified etc, but I guess you can check even before these logs to get exact reason, Also I think this is not kubectl related issue it's system related issue. – asktyagi Sep 03 '22 at 10:46

1 Answers1

0

this error for istio gateway ingress . please check your microservices for response the request .

debug by this commands :

$ istioctl version

$ kubectl version --short

$ helm version --short

$ while true; do curl --write-out %{http_code} http://localhost:80/actuator/health; echo "- $(date)"; sleep 1; done

$ while true; do curl --write-out %{http_code} http://15.0.0.55:15020/app-health/openlr/livez; echo "- $(date)"; sleep 1; done

$ kubectl get events --sort-by='.metadata.creationTimestamp'