10

I have a server which is running Ubuntu 18.04 and it is also a worker node for K8s. Sometimes I see that K8s is killing pods on this machine because of disk presseur and when I get df -h --total I can see that 85% (1.5T) of the disk is in use at /:

~$ df -h --total
Filesystem      Size  Used Avail Use% Mounted on
udev            126G     0  126G   0% /dev
tmpfs            26G  5.3M   26G   1% /run
/dev/sda2       1.8T  1.5T  276G  85% /
tmpfs           126G     0  126G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           126G     0  126G   0% /sys/fs/cgroup
/dev/loop0       90M   90M     0 100% /snap/core/7917
/dev/loop1       90M   90M     0 100% /snap/core/8039
/dev/sdb1       9.8G  203M  9.1G   3% /boot
/dev/sdb2       511M  6.1M  505M   2% /boot/efi
/dev/sdb3       1.8T  100M  1.7T   1% /home
/dev/loop2      128K  128K     0 100% /snap/austin/42
/dev/loop3      3.0M  3.0M     0 100% /snap/micro/648
tmpfs            26G     0   26G   0% /run/user/1001
total           4.0T  1.5T  2.4T  38% -

The problem is when I go to / and get sudo du -BG -s * I can just find 313G of that used data and nothing more:

/$ sudo du -BG -s *
1G  bin
1G  boot
0G  dev
1G  etc
1G  home
0G  initrd.img
0G  initrd.img.old
1G  lib
1G  lib64
1G  lost+found
1G  media
1G  mnt
1G  opt
du: cannot access 'proc/22512/task/22580/fdinfo/20': No such file or directory
du: cannot access 'proc/45752/task/45752/fd/4': No such file or directory
du: cannot access 'proc/45752/task/45752/fdinfo/4': No such file or directory
du: cannot access 'proc/45752/fd/3': No such file or directory
du: cannot access 'proc/45752/fdinfo/3': No such file or directory
0G  proc
1G  root
1G  run
1G  sbin
1G  snap
1G  srv
9G  swap.img
0G  sys
1G  tmp
3G  usr
313G    var
0G  vmlinuz
0G  vmlinuz.old

How I can find the rest of the data and solve the disk pressure problem?

Update

My problem/question was different from the suggested solution. In that case the problem was deleted files but my problem is docker. I post an answer so I can close this question.

AVarf
  • 409
  • 1
  • 6
  • 17

2 Answers2

12

I found a way to use lsof to show me the list of used files and sort them at https://unix.stackexchange.com/a/382696/380398

sudo lsof \
| grep REG \
| grep -v "stat: No such file or directory" \
| grep -v DEL \
| awk '{if ($NF=="(deleted)") {x=3;y=1} else {x=2;y=0}; {print $(NF-x) "  " $(NF-y) } }'  \
| sort -n -u  \
| numfmt  --field=1 --to=iec

and when I used it I got:

118M  /usr/bin/kubelet
168M  /var/lib/docker/containers/ce98aeb3e061c31e81d232933fa21f055169924cd0411ec276d51ae008dbb993/ce98aeb3e061c31e81d232933fa21f055169924cd0411ec276d51ae008dbb993-json.log
185M  /var/lib/docker/containers/933c29608da9d954dc941fc741ffe0b012e6ec55a8befa95b8487f2367596577/933c29608da9d954dc941fc741ffe0b012e6ec55a8befa95b8487f2367596577-json.log
207M  /var/lib/docker/containers/2d4c2967fe22b1eb79b234e465f36ad062c8f390659c2f2f42ad31636be8a1be/2d4c2967fe22b1eb79b234e465f36ad062c8f390659c2f2f42ad31636be8a1be-json.log
272M  /var/lib/docker/containers/4b8daa87cda051a3b2bfd1b89c70763dca990b65b0eb211260f0e6d92b972da9/4b8daa87cda051a3b2bfd1b89c70763dca990b65b0eb211260f0e6d92b972da9-json.log
343M  /var/lib/docker/containers/52cb2d7fceb6bef7a01f7e5c666cb05e0eb62537d54a9b8da8865eba9e51c728/52cb2d7fceb6bef7a01f7e5c666cb05e0eb62537d54a9b8da8865eba9e51c728-json.log
1.1G  /var/lib/docker/containers/fe2c73fd47b37a7a5e70bd1f07508bec7dad024c75b859d933b6fa5bba649f18/fe2c73fd47b37a7a5e70bd1f07508bec7dad024c75b859d933b6fa5bba649f18-json.log
1.1G  /var/lib/docker/containers/8887ea0b31603e0a5b21c934ce06bb4a35133df2367eccb5ad9e2a07eb884bd3/8887ea0b31603e0a5b21c934ce06bb4a35133df2367eccb5ad9e2a07eb884bd3-json.log
42G  /var/lib/docker/containers/1f7180db9e41b66f3646bdf021644b23c1a954830191807532af813f5aa5cde6/1f7180db9e41b66f3646bdf021644b23c1a954830191807532af813f5aa5cde6-json.log
83G  /var/lib/docker/containers/a456e37303998844207c79fc3cdb63878765d7a3151c35051cb071545c75cec7/a456e37303998844207c79fc3cdb63878765d7a3151c35051cb071545c75cec7-json.log
220G  /var/lib/docker/containers/60aad026e90035790ff5f6f1ad714e6187bec5dfeb5b1d3156b7cda1d00cc251/60aad026e90035790ff5f6f1ad714e6187bec5dfeb5b1d3156b7cda1d00cc251-json.log
260G  /var/lib/docker/containers/52c866da942a3228ba56265210ef4f13fbc96ebc1c0214501df189901a829414/52c866da942a3228ba56265210ef4f13fbc96ebc1c0214501df189901a829414-json.log
560G  /var/lib/docker/containers/f56a9853ef993ce3843a2d6acf5c9603a283e64fb4b81d6523342c6ad03243ad/f56a9853ef993ce3843a2d6acf5c9603a283e64fb4b81d6523342c6ad03243ad-json.log

Which correctly sums up to 1.5T (if I also add other stuff that I could see before).

AVarf
  • 409
  • 1
  • 6
  • 17
  • Strange.. I used above command and saw a bunch of files, but when i t tried to delete they were not there. – SHM Dec 23 '20 at 07:08
0

As you can see from the result the most "loaded" partition is /var

So you can continue on this way:

du -sh /var/*

then check the full directory. And then continue to dig further.

One fast and dirty way is to stop applications which run on the server, rotate or compress the logs of those applications, start applications and check again the diskspace

Romeo Ninov
  • 3,195
  • 2
  • 13
  • 16
  • 1
    `/var/` only contains 313 GB of 1.5TB. It's hardly the first place to look. – Gerald Schneider Dec 05 '19 at 09:12
  • I already did that and `/var` is using just 313G that is shown in my question. – AVarf Dec 05 '19 at 09:13
  • @AVarf, did you check the logs, maybe some app create sparse files. – Romeo Ninov Dec 05 '19 at 09:15
  • @GeraldSchneider, do you see any directory which have bigger size? – Romeo Ninov Dec 05 '19 at 09:16
  • In this output, no, but the question I linked to contains answers on how to check for data that is hidden below mountpoints. That's the first thing to check, along with files that have been deleted but are still hold open by processes. – Gerald Schneider Dec 05 '19 at 09:18
  • 1
    @GeraldSchneider, it's good to check this also. But in 99% of the cases its about log files and the way programs write in them – Romeo Ninov Dec 05 '19 at 09:19
  • That is true, but if 1.5TB are in use, and the largest folder contains only 313GB of data, there is just no way you will find the missing data that way. – Gerald Schneider Dec 05 '19 at 09:21
  • @GeraldSchneider, this is my way and in my practice this do the work. Users quite rare remount partitions... – Romeo Ninov Dec 05 '19 at 09:23
  • 2
    @RomeoNinov was right about "But in 99% of the cases its about log files and the way programs write in them" but that command cannot show them, I posted my answer. – AVarf Dec 05 '19 at 09:43