You are correct in your assumption that while all directory entries are deleted immediately after calling unlink(), the actual blocks that physically make up the file are only cleared on disk when nothing is using the inode anymore. (I say "directory entries" because in vfat, a file can actually have several of those, because of how vfat's long file name support is implemented.)
In this context, by inode, I mean the structure in memory that the Linux kernel uses for handling files. It is used even when the filesystem is not "inode based". In the case of vfat, the inode is simply backed by some blocks on disk.
Taking a look at the Linux kernel source code, we see that vfat_unlink
, which implements the unlink()
system call for vfat, does roughly the following (extremely simplified for illustration):
static int vfat_unlink(struct inode *dir, struct dentry *dentry)
{
fat_remove_entries(dir, &sinfo);
clear_nlink(inode);
}
So what happens is:
fat_remove_entries
simply removes the entry for the file in its directory.
clear_nlink
sets the link count for the inode to 0
, which means that no file (i.e. no directory entry) points to this inode anymore.
Note that at this point, neither the inode nor its physical representation are touched in any way (except for the decreased link count), so it still happily exists in memory and on disk, as if nothing happened!
(By the way, it's also interesting to note that vfat_unlink
always sets the link count to 0
instead of just decrementing it using drop_link
. This should tip you off that FAT filesystems do not support hard links! And is further indication that FAT itself does not know of any separate inode concept.)
Now let's take a look at what happens when the inode is evicted. evict_inode
is called when we do not want the inode in memory anymore. At its earliest, this can of course only happen when no process holds any open file descriptor to that inode anymore (but may in theory also happen at a later time). The FAT implementation for evict_inode
looks (again, simplified) like this:
static void fat_evict_inode(struct inode *inode)
{
truncate_inode_pages(&inode->i_data, 0);
if (!inode->i_nlink) {
inode->i_size = 0;
fat_truncate_blocks(inode, 0);
}
invalidate_inode_buffers(inode);
clear_inode(inode);
}
The magic happens exactly within the if
-clause: if the inode's link count was 0, it means that no directory entry is actually pointing to it. So we set its size to 0 and actually truncate it down to 0 bytes, which actually deletes it from disk by clearing up the blocks it was made of.
So, the corruption you are experiencing in your experiments is easily explained: Just as you suspected, the directory entry has already been removed (by vfat_unlink
), but because the inode wasn't evicted yet, the actual blocks were still untouched, and were still marked in the FAT (an acronym for File Allocation Table) as used. fsck.vfat
however detects that there is no directory entry which points to those blocks anymore, complains, and repairs it.
By the way, CHKDSK
would not just clear those blocks by marking them as free, but create new files in the root directory pointing to the first block in each chain, with names like FILE0001.CHK
.
What makes you feel deleting the file and having the processes simply using its cached content in memory would corrupt the file system ? – jlliagre – 2012-09-14T07:27:05.453
@jlliagre: I believe lxgr meant to say, "Does Linux just delete (i.e., clear) the directory entry (on the disk), but leave all the file's data blocks on the disk, allocated (i.e., not releasing them to the free list, or whatever FAT's equivalent of the free list is), retaining in memory sufficient control information to allow the process(es) that have the file open to continue to access it, but flagging it (in memory) as a deleted file, so the blocks will be released when the last process closes the file?" Because that would leave allocated blocks with no way to find them after a hard crash. – Scott – 2012-09-14T17:29:07.357
@Scott, yes, that is exactly what I meant. It seems that Linux actually does that, because when I uncleanly unmount a FAT file system with opened but deleted files on it, there is always some corruption detected by fsck.vfat, and it always is about blocks not marked as free but also not part of any file - which would suggest that the directory entry is deleted, but the corresponding blocks in the FAT are not set to show up as free space until the last handle is closed. – lxgr – 2012-09-15T05:57:24.477