0

I have a question regarding our local Kubernetes installation (Kubelet version == 1.24.4).

We're having a Kubernetes installation installed using Kubespray.

I'm aware of a few related questions/answers in Stackoverflow regarding fixing the KubeletHasDiskPressure flag in Kubernetes, such [1], [2], [3], and [4].

However, in our case, we are purposefully using a master node with a very limited space, so the default DiskPressure values in Kubernetes needs to be adjusted.

enter image description here

I've tried several paths:

1- Tried adding the following to the end of /etc/kubernetes/kubeadm-config.php

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- 10.233.0.10
evictionHard:
  memory.available: "500Mi"
  nodefs.available: "500Mi"
  imagefs.available: "1Gi"
  nodefs.inodesFree: "500Mi"
evictionMinimumReclaim:
  memory.available: "0Mi"
  nodefs.available: "500Mi"
  imagefs.available: "1Gi"
  nodefs.inodesFree: "500Mi"

and then restarted the master node, but it didn't solve the problem.

2- Tried un-tainting the node with the following command: kubectl taint nodes node1 node.kubernetes.io/disk-pressure-

Here is the list of all pods that I have:

enter image description here

And here is the result of kubectl describe node node1

(base) m@node1:~$ kubectl describe node node1
Name:               node1
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node1
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.0.0.47/24
                    projectcalico.org/IPv4VXLANTunnelAddr: 10.233.102.128
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 31 Aug 2022 00:55:59 +0200
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node1
  AcquireTime:     <unset>
  RenewTime:       Tue, 06 Sep 2022 22:52:14 +0200
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 06 Sep 2022 13:15:51 +0200   Tue, 06 Sep 2022 13:15:51 +0200   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Tue, 06 Sep 2022 22:52:14 +0200   Tue, 06 Sep 2022 13:01:58 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         True    Tue, 06 Sep 2022 22:52:14 +0200   Tue, 06 Sep 2022 13:16:29 +0200   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure          False   Tue, 06 Sep 2022 22:52:14 +0200   Tue, 06 Sep 2022 13:01:58 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 06 Sep 2022 22:52:14 +0200   Tue, 06 Sep 2022 13:11:18 +0200   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.0.0.47
  Hostname:    node1
Capacity:
  cpu:                8
  ephemeral-storage:  102101944Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             24523172Ki
  pods:               110
Allocatable:
  cpu:                7800m
  ephemeral-storage:  94097151435
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             23896484Ki
  pods:               110
System Info:
  Machine ID:                 9e0b0071a62e4393a09de330b66c7062
  System UUID:                cb61b600-9ee4-11e7-88a4-c6d8d9353300
  Boot ID:                    9021e1e7-2a1d-4b4b-a7c0-d1f195626660
  Kernel Version:             5.15.0-47-generic
  OS Image:                   Ubuntu 22.04.1 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.17
  Kubelet Version:            v1.24.4
  Kube-Proxy Version:         v1.24.4
PodCIDR:                      10.233.64.0/24
PodCIDRs:                     10.233.64.0/24
Non-terminated Pods:          (6 in total)
  Namespace                   Name                             CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                             ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-node-qfbms                150m (1%)     300m (3%)   64M (0%)         500M (2%)      6d21h
  kube-system                 kube-apiserver-node1             250m (3%)     0 (0%)      0 (0%)           0 (0%)         6d21h
  kube-system                 kube-controller-manager-node1    200m (2%)     0 (0%)      0 (0%)           0 (0%)         6d21h
  kube-system                 kube-proxy-h8wzs                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         4d
  kube-system                 kube-scheduler-node1             100m (1%)     0 (0%)      0 (0%)           0 (0%)         6d21h
  kube-system                 nodelocaldns-gjgwp               100m (1%)     0 (0%)      70Mi (0%)        200Mi (0%)     6d21h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests        Limits
  --------           --------        ------
  cpu                800m (10%)      300m (3%)
  memory             137400320 (0%)  709715200 (2%)
  ephemeral-storage  0 (0%)          0 (0%)
  hugepages-1Gi      0 (0%)          0 (0%)
  hugepages-2Mi      0 (0%)          0 (0%)
Events:
  Type     Reason                Age                  From     Message
  ----     ------                ----                 ----     -------
  Warning  ImageGCFailed         25m (x101 over 8h)   kubelet  (combined from similar events): wanted to free 11413992243 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "registry.k8s.io/pause:3.6" (must force) - container b419b95ee297 is using its referenced image 6270bb605e12
  Warning  EvictionThresholdMet  63s (x2423 over 9h)  kubelet  Attempting to reclaim ephemeral-storage
(base) m@node1:~$ 

Do you have any ideas how can I remove this flag for node1? I don't know if this can help, but it all started after I applied the nginx controller deployments/daemonsets.

Any ideas on how to fix this would be appreciated.

1 Answers1

0

Adding the following to /etc/kubernetes/kubelet.env fixed the issue:

--eviction-hard=nodefs.available<1%,imagefs.available<1%,nodefs.inodesFree<1%

I believe the KubeletConfiguration should still works, but I'm not sure why it's not having any effect. Reported this in Kubernetes git repository to follow this up.