31

I have been asked this question in two consecutive interviews, but after some research and checking with various systems administrators I haven't received a good answer. I am wondering if somebody can help me out here.

A server is out of disk space. You notice a very large log file and determine it is safe to remove. You delete the file but the disk still shows that it is full. What would cause this and how would you remedy it? And how would you find which process is writing this huge log file?

ewwhite
  • 194,921
  • 91
  • 434
  • 799

6 Answers6

56

This is a common interview question and a situation that comes up in a variety of production environments.

The file's directory entries have been deleted, but the logging process is still running. The space won't be reclaimed by the operating system until all file handles have been closed (e.g., the process has been killed) and all directory entries removed. To find the process writing to the file, you'll need to use the lsof command.

The other part of the question can sometimes be "how do you clear a file that's being written to without killing the process?" Ideally, you'd "zero" or "truncate" the log file with something like : > /var/log/logfile instead of deleting the file.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 1
    ... or `fuser`. – Steven Monday Mar 05 '12 at 01:40
  • 1
    Expanding a bit: until all references to a file on disk vanish, that space can't be used by something else. That includes file handles. That also allows this trick to work: http://serverfault.com/questions/45237/link-to-a-specific-inode – Jeff Ferland Mar 05 '12 at 02:48
  • 1
    If you have `no-clobber` set, try: `>| /var/log/logfile` – Belmin Fernandez Mar 05 '12 at 11:35
  • 2
    I ask a variant of this question on every interview: "You're getting disk full messages. `df` says your out of space, `du` says you're barely using any. What's causing it, and why don't the two tools agree?" – voretaq7 Mar 05 '12 at 17:16
  • What to do if after `> /var/log/file` the space on disk still at 100%? The log file seems to be empty... but only after restarting the program that writes on this log file the space is recovered. Is there a way to recover the disk space without restarting the program? – alemani Mar 09 '12 at 17:42
  • Nope. :-) Depending on the program, maybe you can get it to release the file handle differently. As a general idea, "kill -HUP" *could* work. SIGHUP may or may not cause a lot of "things". I would like to expand my original answer (below) in light of the above one with a *facepalm* ... – Mantriur Jan 17 '15 at 02:08
  • Related: `lsof +L1` will show you all open file handles that have less than one (i.e. zero) filesystem links associated with them. This is useful if the files were already removed before you began troubleshooting and you are trying to identify where the disk space has gone. Scroll down the `SIZE` column and you will probably find your culprit. – Andrew B Jun 04 '15 at 20:37
14

There's still another link to the file (either hard link or open file handle). Deleting a file only deletes the directory entry; the file data and inode hang around until the last reference to it has been removed.

It's somewhat common practice for a service to create a temporary file and immediately delete it while keeping the file open. This creates a file on disk, but guarantees that the file will be deleted if the process terminates abnormally, and also keeps other processes from accidentally stomping on the file. MySQL does this, for example, for all its on-disk temporary tables. Malware often uses similar tactics to hide its files.

Under Linux, you can conveniently access these deleted files as /proc/<pid>/fd/<filenumber>.

tylerl
  • 14,885
  • 7
  • 49
  • 71
8

I'm not a sysadmin, but from what I've gathered on Unix.SE, a Linux system won't actually delete a file (mark the space as free/reusable) after it is unlinked until all file descriptors pointing to them have been closed. So to answer the first part, the space isn't yet free because a process is still reading it. To answer the second, you can see which process is using the file with lsof.

Kevin
  • 338
  • 2
  • 13
2

One alternative answer besides the obvious hard link/open file answer: that file is a (very) sparse file such as /var/log/lastlog on RHEL that wasn't actually taking up all that much space. Deleting it had very little impact, so you need to look at the next biggest file.

Alexios
  • 123
  • 6
1

If the process writing the file is root, it'll write into the superuser reserved file space. The file system has this space to keep a system operational in case a user task fills up the disk. This space (imho per default 5%) is invisible to many tools.

lsof can show you, which process has locked the file, ergo is writing to it.

Mantriur
  • 369
  • 2
  • 13
  • 1
    You can also adjust this reserve percentage using tune2fs. This can be a quick way to allow the server to continue running while you free up disk space. – sjbotha Mar 05 '12 at 15:18
1

Besides the file being open by a process, a 2nd case is when you have a file system that supports snapshots like btrfs or ZFS.

For example you take a snapshot with that huge log file existent. If you delete the file now, you will delete only the delta. And the delta is deleted only when the file is not in use.

See also:

A 3rd case is when you have a file system that supports block level de-duplication and most of the file is identical with another file. I do not expect this to happen for a log unless you have a container or VM that is sending the logs to a syslog container or VM which share the same FS so that the log contents are identical.

Mircea Vutcovici
  • 16,706
  • 4
  • 52
  • 80