1

I have a remote server here running Ubuntu (Server Edition).

Yesterday I noticed that 100% of my harddisk space was occupied. There was a log file that grew bigger and bigger, so I deleted it via rm file.foo.

Then I ran df -h but the partition where the file was stored was still hat 100% occupied.

So I thought a reboot might help and ran sudo shutdown -r now.

After waiting some minutes, I couldn't connect to the server via SSH so I asked the guys at the data center to manually restart it.

That worked and the server booted.

So I ran df -h again and now 80% of the partition is occupied (at least something).

Next, I wanted to check what requires that much disk space and ran sudo du -h --max-depth 1 / and the result was:

16K /lost+found
942M    /home
52K /tmp
4.0K    /mnt
236K    /dev
du: cannot access `/proc/17189/task/17189/fd/4': No such file or directory
du: cannot access `/proc/17189/task/17189/fdinfo/4': No such file or directory
du: cannot access `/proc/17189/fd/4': No such file or directory
du: cannot access `/proc/17189/fdinfo/4': No such file or directory
0   /proc
4.0K    /media
4.0K    /opt
4.0K    /srv
32K /root
3.0G    /var
393M    /lib
37M /boot
6.9M    /etc
681M    /usr
4.0K    /selinux
8.0M    /bin
9.0M    /sbin
4.0K    /cdrom
0   /sys
5.0G    /

As you can see in the last line, there are only 5 GB occupied (so the file cannot be in trash or "lost+found") - No way it's there anyway since I used rm command.

So, what's wrong?

My personal guess would be that while the server was restarting, it was somehow cleaning up that huge 500GB file that I removed. Forcing the manual restart probably interrupted that so it was only able to clean up 20% of that.

If my guess is true, what could I do to repair this?

If my guess is wrong, what up with my system then?

Timo Ernst
  • 133
  • 2
  • 9

1 Answers1

8

My first guess would be that whatever program was writing to file.foo is still alive and holding the file handle open: The disk space is only "free" in the eyes of the kernel when the last reference to the inode (file) is cleared, and programs that have the file open count as a reference. For the future: When you move or delete a log file remember to let the program using it know - If you want to really be safe, restart the program in question.

Since you rebooted though that's theoretically impossible -- all programs should have been killed off, so any references they held would have gone away too. That leaves two possibilities I can think of:

  1. You have a hard link to the file that you don't know about.
    If this is the case, du and df should agree about the amount of space you're using on the system.

  2. Your filesystem is corrupted. Probably in the mode that an inode has a positive reference count but isn't actually pointed to by any filesystem objects.
    This is relatively easy (though time consuming) to check: On most Linux systems you can force a filesystem check on reboot by creating a file called /forcefsck (touch /forcefsck as root will do the trick) -- then just reboot and wait (a while!) while your system scans its filesystems looking for things like "lost" inodes with screwy reference counts.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 2
    Lsof is a powerful tool. You can also do `lsof large-file.log` to see exactly what program has it open and kill/restart it – uzzi09 May 21 '12 at 21:21
  • 1
    Or you can use fuser instead or lsof – fpmurphy May 22 '12 at 01:45
  • Will `touch /forcefsck` + reboot also fix found errors or will that just check for problems on the harddisk? – Timo Ernst May 22 '12 at 19:50
  • @valmar you should be prompted for what you want to do if any errors are found. – voretaq7 May 22 '12 at 20:19
  • Via SSH? Remember that the server is remote and I cannot access it manually. – Timo Ernst May 23 '12 at 16:56
  • @valmar I'm not sure how Linux handles it, but on the BSDs there's an option to have the on-boot fsck be `fsck -y` so it doesn't need interaction. On Ubuntu it appears you need to set the `FSCKFIX` option to YES in `/etc/default/rcS` (see http://codepoets.co.uk/2011/fsck-y-or-fsck-yes/) – voretaq7 May 23 '12 at 20:28
  • Thanks, I'll give that a try and let you know if it worked. – Timo Ernst May 23 '12 at 20:31
  • In some cases you may experience situation when you reach % of disk above which space is reserved by OS for root usage. In such event check configuration by running `tune2fs -l | egrep "Block count|Reserved block count` and calculating actual %. – luka5z May 08 '17 at 10:22