0

We have a CronJob which will run every 'x' minutes.

For each and every Schedule, A Pod will be Scheduled and it runs and it is doing its job.

Now, After 2-3 days, The Pod is getting into Pending State and now the Pod is not doing its job and not even getting into Running State.

The Pod will also try to mount a few NFS mount paths when it is getting created.

We investigated and found that the Node on which this Pod is getting Launched/Scheduled is drying out of Inotify Watch Count Limit which is 8192.

I also tried to increase the Inotify Limit using

echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

But the same Issue Persists.

I tried to capture kubelet logs on the Node which the CronJob's Pod is getting scheduled.

I found the below log getting Iterated so many times. I understood that kubelet not able to unmount the NFS path when the pod which completed its job is cleaned up.

Note: Below log is a single liner, I have broken it into Lines for a better view.

Worker Node: RHEL 7.6 (Also Tried 7.9 with Same Issue)

Kernel Version: 3.10

Kubelet Version: 1.23.6

Struggling to fix this. Any recommendations will be helpful.

 kubelet[1784]: E0630 03:42:39.927719    1784 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/nfs/8da4fd7a-dbff-429a-a470-3c09e19a1670-infra-pv-xxxxx
podName:8da4fd7a-dbff-429a-a470-3c09e19a1670 nodeName:}" failed. 
No retries permitted until 2022-06-30 03:44:41.927675124 -0400 EDT m=+37538.561944740 (durationBeforeRetry 2m2s). 
Error: error cleaning subPath mounts for volume "infra" (UniqueName: "kubernetes.io/nfs/8da4fd7a-dbff-429a-a470-3c09e19a1670-infra-pv-xxxx") 
pod "8da4fd7a-dbff-429a-a470-3c09e19a1670" (UID: "8da4fd7a-dbff-429a-a470-3c09e19a1670") : 
error processing /var/lib/kubelet/pods/8da4fd7a-dbff-429a-a470-3c09e19a1670/volume-subpaths/infra-pv-xxxx/cronjob: 
error cleaning subpath mount /var/lib/kubelet/pods/8da4fd7a-dbff-429a-a470-3c09e19a1670/volume-subpaths/infra-pv-xxxxx/cronjob/6: 

Failed to unmount path /var/lib/kubelet/pods/8da4fd7a-dbff-429a-a470-3c09e19a1670/volume-subpaths/infra-pv-xxxxx/cronjob/6

1 Answers1

0

This is a Kubelet Bug.

PR : https://github.com/kubernetes/kubernetes/pull/110973 which will be release under 1.25 release

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 23 '22 at 06:00