1

so, as of recent I have faces a few problem with my local docker-desktop (Windows) Kubernetes cluster.

Every now and then, the cluster just randomly seems to run into DiskPressure, and can't schedule any Pods anymore (all end up in Pending state).

So, I checked what's wrong on the node, and it is constantly under DiskPressure.

One thing I was able to find with kubectl describe nodes was the following log (ImageGCFailed):

kubelet, docker-desktop     wanted to free 5180592947 bytes, but freed 0 bytes space with errors in image deletion: [rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedprofilepictureservice:dev" (must force) - container d8ef807bb674 is using its referenced image e2a36258ddf3, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "rancher/rancher:latest" (must force) - container 06af804517fc is using its referenced image 4251f6ed7d4e, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "prom/prometheus:latest" (must force) - container b08daf935e5d is using its referenced image 6fa696e177e3, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedmonitoring:dev" (must force) - container bcda6e3e0d79 is using its referenced image 63c070c7b160, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "grafana/grafana:latest" (must force) - container 141c6909f9c3 is using its referenced image 651ff2dc930f, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedsavingservice:dev" (must force) - container 13350d549f44 is using its referenced image 4649805f5c2f, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedfrontend:dev" (must force) - container e917511c30db is using its referenced image 0dc1d2af3433, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "rabbitmq:3.8.6-management" (must force) - container 7252761ee146 is using its referenced image 64a1f920fb0d, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedaccountvalidationservice:dev" (must force) - container 09ea0357c333 is using its referenced image 0329c6ba62a1, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedsessionservice:dev" (must force) - container d2b33cb31611 is using its referenced image 21d801ad9175, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedaccountservice:dev" (must force) - container 23c16e0a05ff is using its referenced image 6b3ba9041cca, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedsearchservice:dev" (must force) - container b5f55d1e7246 is using its referenced image e4d40671cbc6, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "redis:latest" (must force) - container 960762cb6661 is using its referenced image 74d107221092, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedchatservice:dev" (must force) - container ea893d0a4bc7 is using its referenced image cabc2a451580, rpc error: code = Unknown desc = Error response from daemon: conflict: unable to remove repository reference "moomedfinanceservice:dev" (must force) - container effa172e3f0a is using its referenced image f092e21dbab3]

So, in essence, there's an attempt to garbage collect images to free up some space, however I now have a few questions I'm really wondering about:

  1. Most of the images references here are not even used by kubernetes - All my images are tagged as :testing, while :dev is only used by my local docker-compose (Which does run at the same time). They are both the same image, just with different tags, but why is my cluster concerned with trying to clean up things it shouldn't even control?
  2. Why is my cluster under constant DiskPressure? I have rechecked once again, I gave my docker-desktop instance a whopping 88GB of storage, which is definitely not filled at all yet. Here is the capacity of my node:

enter image description here

So, I'm a bit lost what to do here now. The problem just seems to self heal when I scale my docker-desktop file system usage up and down, and I also don't see how exactly the state is a problem to begin with, but it just keeps reappearing, so there has to be something.

What do I do?

Sossenbinder
  • 113
  • 1
  • 4

1 Answers1

3

Kubelet has a garbage collector and the purpose of it to remove unnecessary k8s objects for utilising resources.

If the object does not belong to any owner it means its orphaned. There is a pattern in Kubernetes which is known as ownership in kubernetes. Whenever a node experiences Disk pressure, the Kubelet daemon will desperately try to reclaim disk space by deleting (supposedly) unused images. Reading the source code shows that the Kubelet sorts the images to remove by the time since they have last been used for creating a Pod. Error you are getting error response from daemon: conflict: unable to remove repository reference specifies that a container is using the referred image. Check the containers and images in the server.

List the containers you are using:

$ docker ps -a

List the images you are using:

$ docker images

Then stop the container using:

$ docker stop <container_ID>

Later remove the container using:

$ docker rm <container_ID>

Finally, remove the image using:

$ docker rmi <image_ID>

Or forcefully using:

$ docker rm -f <image-id>

Also execute command:

$ docker system prune

It will remove unused data. It will freed up GBs and the DiskPressure taint will get removed. Then you can recreate containers.

Take a look: repository-confict-reference, docker-prune.

Malgorzata
  • 358
  • 1
  • 5
  • Thanks, that was very helpful already. I'm just wondering why I even run into DiskPressure at all - How can I check the DiskPressure taint threshold? – Sossenbinder Nov 30 '20 at 22:06
  • The DaemonSet controller automatically adds the - `node.kubernetes.io/disk-pressure` `NoSchedule` tolerations to all daemons, to prevent DaemonSets from breaking - https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-nodes-by-condition . Config file with thresholds is located at `/var/lib/kubelet/config.yaml` - see: https://stackoverflow.com/questions/64625569/why-kubernetes-sets-disk-pressure-taint-to-my-node https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-thresholds – Malgorzata Dec 02 '20 at 14:25
  • @Malgorzata, could you please provide a resource to read more about kubelet gc? – mostafa8026 Oct 10 '21 at 07:49