Disk full, du tells different. How to further investigate?

Question

I have a SCSI disk in a server (hardware Raid 1), 32G, ext3 filesytem. df tells me that the disk is 100% full. If I delete 1G this is correctly shown.

However, if I run a du -h -x / then du tells me that only 12G are used (I use -x because of some Samba mounts).

So my question is not about subtle differences between the du and df commands but about how I can find out what causes this huge difference?

I rebooted the machine for a fsck that went w/out errors. Should I run badblocks? lsof shows me no open deleted files, lost+found is empty and there is no obvious warn/err/fail statement in the messages file.

Feel free to ask for further details of the setup.

This is very close to the question: linux - du vs. df difference (http://serverfault.com/questions/57098/du-vs-df-difference). The solution was files under a mount point as OldTroll answered. — Chris Ting, May 30 '11 at 16:45

score 157 · Answer 1 · edited Feb 15 '19 at 10:29

157

Just stumbled on this page when trying to track down an issue on a local server.

In my case the df -h and du -sh mismatched by about 50% of the hard disk size.

This was caused by apache (httpd) keeping large log files in memory which had been deleted from disk.

This was tracked down by running lsof | grep "/var" | grep deleted where /var was the partition I needed to clean up.

The output showed lines like this:
httpd 32617 nobody 106w REG 9,4 1835222944 688166 /var/log/apache/awstats_log (deleted)

The situation was then resolved by restarting apache (service httpd restart), and cleared up 2gb of disk space, by allowing the locks on deleted files to be cleared.

edited Feb 15 '19 at 10:29

DDS

145
8

answered Mar 12 '14 at 11:10

KHobbits

1,571
1
9
2

1

For me, the locks where not released even after I stopped the program (zombies?). I had to `kill -9 'pid'` to release the locks. eg: For your httpd it would have been `kill -9 32617`. – Micka Jun 17 '15 at 08:57
8

Minor note: You may have to run `lsof` as `sudo` or not all open file descriptors will show up – ChrisWue Aug 08 '16 at 01:34
I ran into this with H2, which was adding several gigs to a logfile every day. Instead of restarting H2 (slow), I used `sudo truncate -s0 /proc/(h2 PID)/(descriptor number obtained from ls /proc/h2pid/fd)`. – Desty Sep 26 '16 at 14:57
In my case, even when restart `httpd` space doesn't released. When I ran `/etc/init.d/rsyslog restart` it worked :D – Thanh Nguyen Van Sep 29 '16 at 01:11
Thank you! This was the issue for me too. I had a huge log file and even after I deleted it the space did not become available. With `lsof | grep deleted` I found what was keeping it open and restarting the service made the space available again. – Nemo Apr 12 '18 at 15:56
6

You can skip the greps and just do `lsof -a +L1 /var`, where `-a` means AND all conditions (default is OR), `+L1` means list only files with link counts less than 1 (i.e., deleted files with open file descriptors), and `/var` constrains to files under that mount point – kbolino May 13 '18 at 02:50
Same here, but ahd to use sudo and restart `jenkins` – Jonathan Oct 10 '18 at 17:32
That's exactly what I am having. Removing Apache log and restarting it gets me back more than 10 GB! – Anh Tran Dec 28 '18 at 07:04
this help me find the process that was using crazy mem. thanks! – Pablo Martinez Apr 30 '20 at 08:23
@KHobbits answer helped, on listing of lsof | grep deleted , found out MySQL service was having lots of file mark with deleted.(which means they still present on disk due to open file descriptors) systemctl stop mysql
systemctl start mysql did the trick. The machine was UAT do the above carefully on production. – ras Jan 05 '21 at 18:33
have the same bug with nginx, run `sudo service nginx restart` and everything looks equal – Shlomi Jan 06 '21 at 14:09

score 124 · Accepted Answer · answered May 30 '11 at 12:35

124

Check for files on located under mount points. Frequently if you mount a directory (say a sambafs) onto a filesystem that already had a file or directories under it, you lose the ability to see those files, but they're still consuming space on the underlying disk. I've had file copies while in single user mode dump files into directories that I couldn't see except in single usermode (due to other directory systems being mounted on top of them).

answered May 30 '11 at 12:35

OldTroll

1,636
2
12
18

7

You can find these hidden files without needing to unmount directories. Take a look at Marcel G's answer below which explains how. – mhsekhavat Jul 23 '17 at 07:39
1

You should show the CLI commands to do this in your answer – Jonathan Oct 10 '18 at 17:26
1

DO CHECK even if you think that it does not make sense for you! – Chris Oct 26 '18 at 14:32
2

Note: this answer is talking about files located *underneath* mount points (i.e. hidden on the original filesystem), not *within* mount points. (Don't be an idiot like me.) – mwfearnley Nov 27 '18 at 15:18

score 86 · Answer 3 · edited Sep 21 '21 at 16:51

86

I agree with OldTroll's answer as the most probable cause for your "missing" space.

On Linux you can easily remount the whole root partition (or any other partition for that matter) to another place in you filesystem say, /mnt, for example, just issue a

mount -o bind / /mnt

then you can do a

du -h /mnt

and see what is using up your space.

edited Sep 21 '21 at 16:51

David Buck

133
11

answered May 30 '11 at 13:54

Marcel G

2,149
14
24

5

Thanks so much for this tip. Allowed me to find and delete my large, "hidden" files without downtime! – choover Feb 28 '13 at 13:47
Thanks - this showed that docker was filling up my hard drive with diffs in `/var/lib/docker/aufs/diff/` – naught101 Aug 05 '15 at 03:29
`mount -o bind / /mnt` gave me an additional information that I was looking for. Thanks! – Slavik Meltser Oct 20 '19 at 14:10
3

Thanks! With these commands I managed to find what was causing 10% usage on my disk and to free it. To list only the biggest folders and files I used `du /mnt | sort -n -r | head -20` – refex Apr 21 '20 at 10:51
Wanted to add this helped where a folder hadn't mounted in time but another process had written to it. I initially hid OMV's `/sharedfolders` from my `du` command so it was easier to interrogate, and it turned out that a folder within there was causing my problem, thanks. – Dan Clarke Aug 20 '20 at 09:39
1

Just to clarify, we can use any mount point (for e.g. if `/mnt` is already under use) - `mount -o bind / /xxyyzz`. – akki Dec 23 '21 at 02:52

score 32 · Answer 4 · edited Nov 14 '14 at 19:44

32

In my case this had to do with large deleted files. It was fairly painful to solve before I found this page, which set me on the correct path.

I finally solved the problem by using lsof | grep deleted, which showed me which program was holding two very large log files (totalling 5GB of my available 8GB root partition).

edited Nov 14 '14 at 19:44

user

4,267
4
32
70

answered Nov 14 '14 at 18:15

Adrian

321
3
2

1

This answer makes me wonder why you are storing log files on the root partition, especially one that small... but to each their own, I suppose... – user Nov 14 '14 at 18:47
I had a similar issue , I had restarted all the applications that were using the deleted file , I guess there was a zombie process still holding on to a large deleted file – user1965449 Dec 15 '15 at 02:37
1

This was the case for us, a log processing linux app known as filebeat kept files open. – Pykler Dec 07 '16 at 20:53
@Pykler For us it was filebeat as well. Thanks for the tip! – Martijn Heemels Jan 29 '19 at 09:30
This answer helped us solve the storage full problem, thanks! – digz6666 Feb 24 '22 at 03:12

score 30 · Answer 5 · edited Dec 15 '15 at 08:17

30

See what df -i says. It could be that you are out of inodes, which might happen if there are a large number of small files in that filesystem, which uses up all the available inodes without consuming all the available space.

edited Dec 15 '15 at 08:17

HBruijn

72,524
21
127
192

answered May 30 '11 at 14:10

eirescot

554
4
8

1

The size of a file and the amount of space it takes on a filesystem are two separate things. The smaller the files tend to be, the bigger the discrepancy between them. If you write a script that sums up the sizes of files and compare it to the `du -s` of the same subtree, you're going to get a good idea if that's the case here. – Marcin May 30 '11 at 15:00

score 9 · Answer 6 · answered May 30 '11 at 12:51

9

Files that are open by a program do not actually go away (stop consuming disk space) when you delete them, they go away when the program closes them. A program might have a huge temporary file that you (and du) can't see. If it's a zombie program, you might need to reboot to clear those files.

answered May 30 '11 at 12:51

Paul Tomblin

5,217
1
27
39

OP said he'd rebooted the system and the problem persisted. – OldTroll May 30 '11 at 12:58
1

I had zombies that wouldn't release the locks on the files, I `kill -9 'pid'` them to release the locks and get the disk space back. – Micka Jun 17 '15 at 08:58

score 9 · Answer 7 · answered Feb 09 '18 at 11:51

9

For me, I needed to run sudo du as there were a large amount of docker files under /var/lib/docker that a non-sudo user doesn't have permission to read.

answered Feb 09 '18 at 11:51

jobevers

211
2
4

1

This was my problem. I forgot I switched storage systems in docker and the old volumes were still hanging around. – Richard Nienaber Jan 09 '19 at 09:21
1

I had the same problem, thanks. This helped me: `docker system prune -a -f; docker volume rm $(docker volume ls -qf dangling=true)` – mnicky Dec 08 '19 at 15:48
This together with @mnicky answer just saved me... I had over 27Gb of disk space trapped by docker up to a point that I couldn't boot into Ubuntu... – mdev Jan 14 '21 at 16:26

score 5 · Answer 8 · answered Jun 26 '11 at 10:38

5

Try this to see if a dead/hung process is locked while still writing to the disk: lsof | grep "/mnt"

Then try killing off any PIDs which are stuck (especially look for lines ending in "(deleted"))

answered Jun 26 '11 at 10:38

Phirsk

51
1

Thanks! I was able to find that the SFTP server process was holding the deleted file – lyomi Aug 30 '13 at 04:43

score 5 · Answer 9 · answered Jun 26 '11 at 13:05

This is the easiest method I have found to date to find large files!

Here is a example if your root mount is full / (mount /root) Example:

cd / (so you are in root)

ls | xargs du -hs

Example Output:

 9.4M   bin
 63M    boot
 4.0K   cgroup
 680K   dev
 31M    etc
 6.3G   home
 313M   lib
 32M    lib64
 16K    lost+found
 61G    media
 4.0K   mnt
 113M   opt
 du: cannot access `proc/6102/task/6102/fd/4': No such file or directory
 0  proc
 19M    root
 840K   run
 19M    sbin
 4.0K   selinux
 4.0K   srv
 25G    store
 26M    tmp

then you would notice that store is large do a cd /store

and run again

ls | xargs du -hs

Example output: 
 109M   backup
 358M   fnb
 4.0G   iso
 8.0K   ks
 16K    lost+found
 47M    root
 11M    scripts
 79M    tmp
 21G    vms

in this case the vms directory is the space hog.

Why not using simpler tools like `baobab`? (see http://www.marzocca.net/linux/baobab/baobab-getting-started.html) — Yvan, May 05 '15 at 07:11
Hm `ls` + `xargs` seems like overkill, `du -sh /*` works just fine by itself — ChrisWue, Aug 08 '16 at 01:35
if you don't know about ncdu ... you'll thank me later: https://dev.yorhel.nl/ncdu — Troy Folger, Dec 05 '16 at 22:56

score 2 · Answer 10 · answered Dec 05 '16 at 23:02

One more possibility to consider - you are almost guaranteed to see a big discrepancy if you are using docker, and you run df/du inside a container that is using volume mounts. In the case of a directory mounted to a volume on the docker host, df will report the HOST's df totals. This is obvious if you think about it, but when you get a report of a "runaway container filling the disk!", make sure you verify the container's filespace consumption with something like du -hs <dir>.

score 2 · Answer 11 · answered May 04 '17 at 18:01

So I had this problem in Centos 7 as well and found a solution after trying a bunch of stuff like bleachbit and cleaning /usr and /var even though they only showed about 7G each. Was still showing 50G of 50G used in the root partition but only showed 9G of file usage. Ran a live ubuntu cd and unmounted the offending 50G partition, opened terminal and ran xfs_check and xfs_repair on the partition. I then remounted the partition and my lost+found directory had expanded to 40G. Sorted the lost+found by size and found a 38G text log file for steam that eventualy just repeated a mp3 error. Removed the large file and now have space and my disks usage agrees with my root partition size. I would still like to know how to get the steam log to not grow so big again.

Did this happen to you at work? https://serverfault.com/help/on-topic — chicks, May 04 '17 at 20:37

darxtrix · Answer 12 · 2018-04-20T19:36:00.677

A similar thing happened to us in production, disk usage went to 98%. Did the following investigation :

a) df -i for checking the inode usage, inode usage was 6% so not much smaller files

b) Mounting root and checking hidden files. Could not file any extra files. du results were same as before mount.

c) Finally, checked nginxlogs. It was configured to write to disk but a developer deleted the log file directly causing nginx to keep all the logs in-memory. As the file /var/log/nginx/access.log was deleted from disk using rm it was not visible using du but the file was getting accessed by nginx and hence it was still held open

score 0 · Answer 13 · answered Jun 21 '12 at 10:33

if the mounted disk is a shared folder on a windows machine, then it seems that df will show the size and disk use of the entire windows disk, but du will show only the part of the disk that you have access too. (and is mounted). so in this case the problem must be fixed on the windows machine.

score 0 · Answer 14 · answered Jul 18 '18 at 13:33

I had the same problem that is mentioned in this topic, but in one VPS. So I have tested everything that is described in this topic but without success. The solution was a contact for support with our VPS provider who performed a quota recalculation and corrected the space difference of df -h and du-sh /.

score 0 · Answer 15 · answered Oct 12 '18 at 21:58

I ran into this problem on a FreeBSD box today. The issue was that it was an artifact of vi (not vim, not sure if vim would create this problem). The file was consuming space but hadn't fully been written to disk.

You can check that with:

$ fstat -f /path/to/mount/point |sort -nk8 |tail

This looks at all open files and sorts (numerically via -n) by the 8th column (key, -k8), showing the last ten items.

In my case, the final (largest) entry looked like this:

bob      vi         12345    4 /var      97267 -rwx------  1569454080 rw

This meant process (PID) 12345 was consuming 1.46G (the eighth column divided by 1024³) of disk despite the lack of du noticing it. vi is horrible at viewing extremely large files; even 100MB is large for it. 1.5G (or however large that file actually was) is ridiculous.

The solution was to sudo kill -HUP 12345 (if that didn't work, I'd sudo kill 12345 and if that also fails, the dreaded kill -9 would come into play).

Avoid text editors on large files. Sample workarounds for quick skimming:

Assuming reasonable line lengths:

{ head -n1000 big.log; tail -n1000 big.log } |vim -R -
wc -l big.log |awk -v n=2000 'NR==FNR{L=$1;next}FNR%int(L/n)==1' - big.log |vim -R -

Assuming unreasonably large line(s):

{ head -c8000 big.log; tail -c8000 big.log } |vim -R -

These use vim -R in place of view because vim is nearly always better ... when it's installed. Feel free to pipe them into view or vi -R instead.

If you're opening such a large file to actually edit it, consider sed or awk or some other programmatic approach.

score 0 · Answer 16 · answered Mar 05 '19 at 17:43

0

check if your server have ossec agent installed. Or some proccess is using the deleted log files. In my a time ago was ossec agent.

answered Mar 05 '19 at 17:43

Richard Mérida

1

1

OP mentioned that the machine was rebooted, so there should be no deleted files left. – RalfFriedl Mar 05 '19 at 18:06

score 0 · Answer 17 · answered Dec 24 '19 at 16:30

In my case lsof did not help. I was able to track this down because I had mounted disk images using losetup as loop devices. Even after unmounting these devices and deleting the corresponding images there were processes that maintained some sort of indirect reference to the disk images.

So in short, sudo ps -ef|grep loop then sudo losetup -d /dev/loopX. This is not a direct answer to why du and df disagree but it has come up often enough for me that I was able to finally figure out the reason why which was different from any answer I could find.

score -4 · Answer 18 · edited Nov 23 '16 at 23:01

-4

check the /lost+found, I had a system (centos 7) and some of file in the /lost+found ate up all the space.

edited Nov 23 '16 at 23:01

Michael Hampton

237,123
42
477
940

answered Nov 23 '16 at 22:24

Jude Zhu

1

How would this account for the difference in reported disk usage _as described in the question_? – roaima Nov 30 '16 at 23:36

Disk full, du tells different. How to further investigate?

18 Answers18

Linked

Related