How can a file size be zero?

173

44

Just something I ran into and couldn't think of a proper explanation. If I create an empty *.txt file on my PC and then look at its size, it shows 0. But how is that possible? I mean even if the file itself is empty, it still must have some size, just to store its own name. How can this be explained? (Non OS specific)

Eugene S

Posted 2015-09-15T08:32:53.780

Reputation: 2 088

82the file name does not count in the file, that how it can be explained. – njzk2 – 2015-09-15T19:23:51.707

124I'm reminded of a friend in college who wrote a piece of software to store text as filenames to get around the disk quota. – slebetman – 2015-09-17T02:38:27.400

3@slebetman But then what happens when the MFT (or FAT or whatever it was) grew too big? Or was it a directory size quota? – Cole Johnson – 2015-09-17T20:27:25.107

16

@ColeJohnson I was an intern back in the 2000's in one of my U's computer lab, and the user quota was calculated as sum of filesizes. So storing data as file names would indeed get around qouta. Heck you could save a program in folders and it would not count against your quota.

– Mindwin – 2015-09-17T21:22:31.053

8Also, you might find it interesting that, in some cases, the file can even contain actual data and still be reported as having zero size. See file systems with forks (for example, NTFS ADS). – T. C. – 2015-09-17T22:19:26.180

2@ColeJohnson: The disk would be full so the sysadmin would notice I guess. But he never got caught. We had a 10MB quota at the time on I'm guessing a 500MB shared disk. He had lots of pictures and graphs he needed for his dissertation so that used up a lot of space. So he cheated on the text storage. – slebetman – 2015-09-18T02:07:11.127

@ColeJohnson: Back in 1999 a laptop was still quite expensive so around 80% of students used the lab for work. I guess these days personal computers are more common – slebetman – 2015-09-18T02:09:40.597

1@slebetman out of curiosity, do you remember how he encoded the file names? I'd assume base-64 or something – Cole Johnson – 2015-09-18T02:39:42.500

3@ColeJohnson I think it was just plain text with some encoding for special characters. If I remember correctly, the first 4 digits were the sequence numbers. – slebetman – 2015-09-18T03:10:52.953

21@slebetman This is the point where the line between genius and insanity becomes blurred. – Pharap – 2015-09-18T16:47:22.277

10

A similar technique was famously used in a compression challenge,

– Oddthinking – 2015-09-20T13:12:27.217

2

possible duplicate of How are the file metadata stored in Windows?

– Ching Chong – 2015-09-24T09:13:09.420

@slebetman, So.. did it work? Does it scale? – Pacerier – 2016-08-19T11:26:51.713

@Pacerier: It worked for him. He stored around 50MB of data I think (not exactly sure of the details). That was way more than the 10MB limit. – slebetman – 2016-08-19T14:37:32.763

Answers

202

It's possible because there really is no file. There's just a directory entry with a name and owner. The directory entry is logically distinct from the file. For example, the same file can have more than one name in more than one directory.

Unfortunately, the term "file" isn't always used to mean precisely the same thing. But the file size logic comes from the model where a directory entry "attaches" a file to a directory and file names and related metadata are stored in the directory.

David Schwartz

Posted 2015-09-15T08:32:53.780

Reputation: 58 310

30...also known as Hard Links. – Daniel B – 2015-09-15T08:35:36.267

Thanks for the answer. So where the information about a file name resides? – Eugene S – 2015-09-15T08:38:10.150

6In the directory. Otherwise, if the same file was in two directories and you renamed it in one, that would modify the other directory, which would make no sense at all. Also, were it not this way, what would the contents of a directory be?! – David Schwartz – 2015-09-15T08:38:44.760

1Thanks again. That's what I thought too. But then if I create a directory with an empty file like this and check the size of that directory it will be still be 0. Shouldn't the size of that directory be at least as the size of this file's name? – Eugene S – 2015-09-15T08:43:04.537

1@EugeneS Windows doesn't provide any easy way to get the size of a directory. What method are you using? – David Schwartz – 2015-09-15T08:45:19.710

Well I tried the usual right click -> Properties and I also tried using a Git Bash shell that I use on my PC. – Eugene S – 2015-09-15T08:47:50.427

1@EugeneS I don't think any of those ways work. They probably all wind up calling getFileSize, which returns zero for directories. This is an irritating difference between Windows and many other OSes. – David Schwartz – 2015-09-15T08:49:50.300

I understand. Thanks. Just out of curiosity, on what OSes does that work differently? On these OSes, will I be able to see the directory size increase as I add new empty files into it? – Eugene S – 2015-09-15T09:05:03.767

14On most UNIX-like OSes, like FreeBSD and Linux, you can easily get the size of a directory. Commands like ls -ld <directory> will work. – David Schwartz – 2015-09-15T09:10:00.410

11I don't know if this is true for the current version of NTFS, but early versions (e.g. on NT3.x) would store the data for very small files in the directory entry. The file would literally not exist. – John Rennie – 2015-09-16T05:53:52.803

@DavidSchwartz confirming that it also works on OS X, which is probably the most common UNIX-like OS (just to dispel any misconceptions that this is some wacky quirk of niche operating systems) – David Z – 2015-09-16T08:01:21.520

3There is no spoon. – Cássio Renan – 2015-09-16T12:41:31.567

13It's not quite true that there's no file, unless NTFS is very different from other filesystems. On a normal Unix filesystem, there'd be an inode storing the permissions, mod-times, and so on. The directory entry still refers to this inode. The only difference between an empty file and a non-empty file is the pointer to allocate blocks. An empty file has the filesystem equivalent of a NULL pointer for its block map, though, to indicate that it doesn't have any data blocks. Directory entries aren't cluttered up with permissions and mod times, even for empty files. eg XFS inodes are 256B – Peter Cordes – 2015-09-16T22:59:54.867

1The size of a directory, as reported by windows explorer, is simply the sum of its files sizes. – ths – 2015-09-17T17:39:36.577

1@EugeneS Again, a 0-size file is a 0-size file, so even the oft-inept Windows Explorer is correct here when it shows you 0 bytes in the Properties. The directory size will not "increase as [you] add new empty files into it". The size of a directory, as understood by humans, is an abstract concept that is usually evaluated to be the sum of the sizes of the files contained therein. If they are all 0-size, then so is the directory. – underscore_d – 2015-09-18T16:55:03.417

2This answer is wrong: there is definitely a file (it has all properties of a file, such as an owner and access permissions), there is just no content in the file. – reinierpost – 2015-09-21T10:14:22.203

2

@John Rennie: Same thing is currently implemented in the ext4 FS: https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Inline_Data

– Piskvor left the building – 2015-09-23T15:50:40.640

what did mean: the same file can have multiple names in different directories? Is not a file just a space on a disk to keep data? is it possible that different filename in filesystem point to the same space on disk w/o causing overlap error? – Boris G – 2015-09-30T20:22:17.663

2@BorisG Yes, exactly. Many filenames can refer to the same file. When the file's reference count reaches zero, the disk space is freed. – David Schwartz – 2015-09-30T20:24:11.500

oh, you meant references to the same file. thanks. – Boris G – 2015-09-30T20:30:37.453

@BorisG The directory entries are references to the file. – David Schwartz – 2015-09-30T21:41:03.237

I do not understand that then. Say I copied text file in a dir (filesystem alloc a space on HDD for that and put data there), how do you add a reference to that file from another dir? if I copy it there there will be another space alloc for that, right? – Boris G – 2015-10-02T15:07:04.183

1

@BorisG Use the CreateHardLink function.

– David Schwartz – 2016-01-25T21:15:54.250

@JohnRennie It's (still) true that on NTFS, small files will be put in the master file table instead of allocating a cluster. The term they use is "resident". – doug65536 – 2016-10-15T11:15:13.373

@DavidZ Linux is probably the most common Unix-like. There are gazillions of Linux servers. BTW MacOS is not Unix-like because it's already Unix – phuclv – 2017-06-09T05:29:46.590

82

The semantic meaning of "file size" is different from the one you are using.

There are many file sizes which are meaningful. The most common one, and the one you are seeing here, is "the number of bytes in the file." If the file is an empty text file, it may indeed contain 0 bytes. This number is important to programmers because we often need to open a file, "read all the data," and close it. We need to know how many bytes of data will be in the file so we can plan ahead.

Another meaning arises from the way most file systems store data. Most file systems store data in blocks. For example, the file system may store data in 64kB blocks, meaning it will never allocate anything which is not an even multiple of 64kB. This sounds inefficient, but it can make bookkeeping quite a lot simpler, and often simpler means faster.

A third meaning, which you are tugging at, would be the actual number of bits required on the harddrive to describe the presence of a file. This includes information that is usually stored separately from the file. For instance, in Linux, the concept of the "filename" is stored in the inode for the directory containing the file (edit: from comments, technically this is stored in the directory's data. When I wrote this, I was thinking of the small-directory case. Data smaller than 156 bytes can be stored directly in the inode). This is not a commonly used meaning, because it is terribly hard to determine without knowing tremendously deep inner workings of your file system (did you account for the space needed to store all the permissions on the file?). However, if you have a 1,000,000 byte hard drive, and want to know how big of a file fits on that hard drive, this will be a very important meaning to you!

Cort Ammon

Posted 2015-09-15T08:32:53.780

Reputation: 2 316

2"in the inode for the directory containing the file" Don't you mean the directory's data, rather than its inode? The inode contains file sizes and dates, but no names... – Medinoc – 2015-09-16T13:49:37.263

@Medinoc Good point. I was thinking of the inline case when it stored the data within the inode, but I didn't actually check to see how much this could occur! I've added an edit. – Cort Ammon – 2015-09-16T14:37:36.517

Related inline data feature of ext4, this is by no means universal across all filesystems. Additionally, this applies to the files inode, not the directory. They are separate, directories also have an inline data capability, but they are separate features. A files inode has a set size, at least in the case of ext4, so the data usage of permissions is irrelevant. A files disk usage is heavily dependant on the filesystem in use, the third part of this answer only applies to ext4 as far as I can tell, this is not made clear.

– Phizes – 2015-09-17T08:02:53.947

8If you have a 1,000,000 byte hard drive it might be time to start thinking about an upgrade. – nekomatic – 2015-09-17T09:09:39.327

53

The file name is stored somewhere else.

Your disk will have a "file system" on it, put simply a method for choosing how file names and files are represented and interpreted on the physical disk.

On most Windows disks you will be using a file system called "NTFS" (New Technology File System"), this stores filename information in the Master File Table (MFT) separate from the file contents. See the Wikipedia article on Master File Table.

The file itself will therefore be of length 0 bytes, but its entry in the MFT will still occupy some space.

Matthew1471

Posted 2015-09-15T08:32:53.780

Reputation: 1 112

12and in case of NTFS, the size of file reported by Windows and most tools is actually the size of the main stream of the file, which we perceive as the content of the file. The file stored on NTFS partition can additionaly have some data stored in alternative data streams, and still have the reported size of 0. It's a nice filesystem feature to know if you want to have the full picture :) – Paweł Bulwan – 2015-09-16T10:32:18.033

12

This is quite an interesting ontological question...

The file itself is the content of the file. If the file has no content, it has a size of zero. The file name is as much a part of the file as your own name is physically a part of you (ie, it isn't).

Just as your name exists as an idea in people's heads (and your own) that refers/points to the physical you, the file name exists in the file system's directory tree and it refers/points to the file.

Luke

Posted 2015-09-15T08:32:53.780

Reputation: 281

7

(A little late to the answer ...)

How can a file be size zero is a little more complicated than provided by the above answers. The question is tagged Win7, but looking at other "simpler" file systems such as FAT or NTFS, may be useful as the concepts are similar.

The disk does not "know" what is a file and what is a directory; it's all data in little blocks. The OS distinguishes between the meaning of data blocks. The first few a special, but the rest of the blocks hold either information about the data (eg: file name, file length, first data block holding the data), or the data itself.

A directory is a special "file" whose "data" the OS understands is an information block containing information about files, not the content of the files. A good analogy is a physical library and the card catalog. Think of the information blocks as the card catalog and the shelves as the data blocks (card catalog also sits on a shelf-like structure).

When you "create" a file (say with UNIX touch command), the OS first creates an entry in an information block (directory), with the following:

  • Name = My_File.txt
  • Length = 0
  • Starting Data Block = N/A
  • Additional info (owner, permissions, created/updated/modified date), etc

Only if there is some data to "write" does it attempt to find an empty data block to store the data. But the data blocks come in fixed sizes (say 32K) convenient for the disk to get to and the OS to read. If you only write "Hello", most of the block is "empty" (actually may not be zeros, but garbage from what was there before), so the table also now updates the size to the length (say 5 chars + End of File) so you don't get the bad stuff.

When you update the "file" to a length > block size, the OS writes the data to the new block and updates a data block to say the file continues onto next block AFTER the first (and so on) and the length is updated the new length (details differ).

What you end up with is a collection of information data blocks (directories or lists) with information about the chains of data blocks (file contents).

Logically, this also explains why a file move on the same filesystem is blinking fast while a copy takes a long time. The OS only has to edit 2 directory blocks to remove the entry from one directory (information data block) and add to another. Delete a file: just remove the entry in the directory block, freeing up the file data blocks to be reallocated.

ps: Just because the card catalog has entry for a book does not mean it's on the shelve (checked out or lost perhaps); file size 0.

pps: A misplaced book inside library implies search library, or in computer terms: chkdsk or repair disk!

A greater understanding can be gleaned by reading about UNIX inodes or appreciating how version control systems (ClearCase, TFS, Git, etc.) manage not only files and directories, but also versions of files and even versions of directories. In most cases, everything is stored in a database and presented to the user to appear as classical directory structure and files!

Ian W

Posted 2015-09-15T08:32:53.780

Reputation: 235

4

Filesystems store a lot of information about a file such as file name, file size, creation time, access time, modified time, created user, user and group permissions, fragments, pointer to clusters that store the file, hard/soft links, attributes... Those are called file metadata. Why do you count those metadata into file size when users do not (need to) care about them and don't know about them? They only really care about the file content

Moreover each filesystem stores different types of metadata which take different amounts of space on disk. For example POSIX permissions are very different from NTFS permission, and there are also inode numbers in POSIX which do not exist on Windows. Even POSIX filesystems vary a lot, like ext3 with 32-bit block address, ext4 with 48-bit, Btrfs with 64-bit and ZFS with 128-bit address. So how will you count those metadata into file size?

Take another example with a 100-byte file whose metadata consumes 56 bytes on the current filesystem. We copy the file to another filesystem and now it takes 128 bytes of metadata. However the file contents are exactly the same, the number of bytes in the files are also the same. So displaying file size as 156 bytes on a system but 228 bytes on another is very confusing and counter-intuitive.

phuclv

Posted 2015-09-15T08:32:53.780

Reputation: 14 930

4

We have some excellent answers here - I'd just to add the picture version (a thousand words and all that.)

This is what one of my NTFS-formatted hard drives look like if you visualize it with a disk defragmenting tool. The MFT (Master File Table) is shown in violet:

enter image description here

That little violet square describes the list of files present in my HD. In rough terms it is, for a NTFS disk, what the Table of Contents is for a book; instead of pages, it points to their physical location on the rest of the disk1.

A file with a zero-bytes size can be visualized as a Table of Contents entry that points to no page at all:

enter image description here

The entry is there, listed - but since no page is indicated, we can assume that the content is non-existent.

1 - Surely, it's a little bit more complicated than that; but points like sector maps, mirror MFTs, etc. are out of this questions' scope.

OnoSendai

Posted 2015-09-15T08:32:53.780

Reputation: 140

1

A file size of 0, is similar to saying: I have a paper with 5 words on it. And on another paper, it has 0 words on it. So 0 is entirely possible.

The file's meta data (creation date time, last modified date time, file owner, permissions), are all stored else where and not included as part of the file size.

nonopolarity

Posted 2015-09-15T08:32:53.780

Reputation: 7 932

0

Understand it in a simple manner... when you create a file .. there is a directory entry generated which works as a pointer for the memory location of the file identified by the file name you provide. The size of the directory increases as you create more and more pointers or say files.. while the file size will increase only if you put ssome data at the pointed place i.e. inside the file itself. Till then the size will be zero. :)

Vikash Mishra

Posted 2015-09-15T08:32:53.780

Reputation: 9

1This is really a comment—not an answer—and just repeats what others have said. – JakeGould – 2015-09-23T19:15:08.637

0

So this is how it works:

As soon as you create any file on a volume it creates an file record in the NTFS mata file i.e. $MFT (Master file table). Since there is a FRS (File record segment) present in the MFT you will see a record. Each file record is of size 1 KB by default in case of NTFS file system. But that space is only claimed if you store some information inside the file. Even though you just write a single letter "a" considering that it is a text file, it will claim 1 KB of space because that is the default size of the FRS. The letter "a" goes to the default and unnamed data stream of that FRS, $Data which is a attribute where all you data goes if you don't have a ADS (Alternate Data Stream).

Let me know if you come up with any questions.

Sdf

Posted 2015-09-15T08:32:53.780

Reputation: 1