Why is unzipped directory much smaller (4.0 K) than zipped (73.0 G)?

38

3

I unzipped a zipped file using zip -l <filename> but what get is a dir much smaller than what it was before unzipping. Unzipped dir has all the files mostly videos. Why is the unzipped directory exactly 4.0k? Am I missing something?

Bash output of command ls -alh:

drwxrwsr-x  4 shubhankar gen011    4.0K May 19 15:47 Moments_in_Time_256x256_30fps
-rw-rw-r--  1 shubhankar gen011     73G Mar  1  2018 Moments_in_Time_256x256_30fps.zip

bluedroid

Posted 2019-05-20T00:16:31.307

Reputation: 457

5Instead of ls -lah, try using du -h on the directory – hojusaram – 2019-05-20T02:50:28.727

15Maybe it would be a good idea to change the question title to something like "Why is my unzipped file only 4KB?" – therefromhere – 2019-05-20T05:53:18.293

4@therefromhere No, that would be completely changing the question, and it would be asking about a situation that is not occurring. – Scott – 2019-05-22T01:04:27.007

1This question is sooooooo duplicated. I wonder why is so highly voted – Pedro Lobito – 2019-05-22T23:26:50.740

Answers

151

The size of a directory as shown in your screenshot isn't the sum of the size of the contents, it is the size of the meta-data associated with the directory - file names, etc.

https://unix.stackexchange.com/questions/55/what-does-size-of-a-directory-mean-in-output-of-ls-l-command

To find out how much space the directory contents are using, you can use

du -sh /path/to/directory

ivanivan

Posted 2019-05-20T00:16:31.307

Reputation: 2 634

21And the answer to just why this design decision was made is left to the reader (after running both commands ;-) ). – Peter - Reinstate Monica – 2019-05-20T11:44:58.397

1To be fair the filesystem could cache the total size of each directory in the metadata – poizan42 – 2019-05-20T14:01:08.100

11@poizan42, no, because files could be hardlinked, so you cannot just sum up sizes when walking up the hierarchy. – Simon Richter – 2019-05-20T14:11:38.963

1Yeah, well, but that problem exists whether you are summing the file sizes up recursively or caching them per-directory... – poizan42 – 2019-05-20T14:13:43.150

25@poizan42 that would be quite inefficient, requiring the filesystem to update all the parents directories at every change (including root dir, its size would change constantly). – Erwan – 2019-05-20T15:37:55.623

1@poizan42: You can track the inode numbers you have already seen to avoid double counting, but caching that information would carry substantial overhead. – Kevin – 2019-05-20T16:16:17.077

Not exactly. The size of the directory (not the directory contents) is the size of an inode https://en.wikipedia.org/wiki/Inode and (at least on my system) is ALWAYS a multiple of 4096 bytes.

– jamesqf – 2019-05-21T03:44:27.443

@jamesqf The directory's contents are the files contained in it, and they are written to file system blocks as any other file content. So the size of an inode doesn't really count in here. – glglgl – 2019-05-21T10:10:04.270

8@poizan42 That solution is even worse than it would appear on first glance (which is already unacceptably slow): Inodes do not store references to the directories that link them, but just a count. Meaning you'd also have to store a whole lot more of metadata with each inode and worry about keeping everything in sync. Quite an awful lot of overhead and complexity for what would be a rarely used feature. – Voo – 2019-05-21T11:53:24.100

Not always! This number depends on the implementation of the filesystem. EXT filesystems show the size taken by the directory metadata, ZFS shows the amount of files in a directory. – istepaniuk – 2019-05-21T14:48:17.533

@glglgl: You misunderstand. Say you have a directory called "stuff", and you do an ls -l in its parent directory. You will see the directory "stuff" listed with a size of 4K (or multiples thereof), even if "stuff" has no files in it at all. But if you do "ls -l stuff", you will see a listing of the contents of "stuff", which could be zero or many GBytes. – jamesqf – 2019-05-21T17:26:14.743

@jamesqf I am aware of that. But that has nothing to do with the size of an inode. The difference is: an inode has a maximum length of 1 block, but that is (more or less) coincidence. The file system block is what really counts. – glglgl – 2019-05-21T20:18:53.893

@glglgl: We seem to be talking about two different things. I'm trying to answer the OP's question about why ls -l shows the size of his directory to be just 4K. Which is because it's showing size of the directory structure, which in most common Linux filesystems is one or more 4K blocks. – jamesqf – 2019-05-21T20:24:49.770

@jamesqf Probably we talk about the same, but use a different wording. Let me try it this way: A directory's contents essentially consist of the names of the files it contains along with the number of the inode it represents. It is allocated in multiples of the file system block size. As there are no empty directories (they at least contain . and ..), their size is at least once the block size, in most cases 4096 bytes. But as the inodes are not stored in the directory, it is confusing to refer to the size of an inode here, even if the number is the same. – glglgl – 2019-05-21T20:37:00.777

4K is not actually the size of the metadata. 4K is just the minimum file size in Windows. – Strill – 2019-05-21T23:12:45.750

@Erwan See git for a system where every change to every file in a "directory" tree causes the "root directory" to be updated. – Yakk – 2019-05-22T15:20:42.937

1@Yakk it's not really comparable: first a git repo is usually much smaller and doesn't index all the existing files (e.g. temporary files, logs...), but more importantly the git dir structure is updated on a one-off basis when a commit is done, as opposed to a general filesystem which must be updated constantly. – Erwan – 2019-05-22T16:21:21.220

1@Strill "4K is just the minimum file size in Windows." 1) This has nothing to do with Windows. 2) It's not so much of a "minimum file size" as it is what the block size is and how many of them the file uses. Disk space tends to be used in reasonably sized chunks instead of using the exact amount of space that the file data needs. This is more efficient. Most systems have a block size of 4k. If you had a file whose data was "4k plus 1" bytes on a 4k-block system, then it would use 8k because it would reserve another reasonably sized chunk of the disk. This makes disk access way more efficient. – Loduwijk – 2019-05-22T21:31:39.297

Of course it has to do with windows. 4K is the minimum block size on all modern Windows operating systems under NTFS. https://support.microsoft.com/en-us/help/140365/default-cluster-size-for-ntfs-fat-and-exfat

– Strill – 2019-05-23T00:24:26.697

@Strill: The OP is asking about Linux, so the question has nothing to do with Windows. And while a file may occupy a minimum of a 4K block in commonly-used Linux file system such as ext4, that is not the file size that will be shown by an "ls -l" command. It will show the actual number of bytes in the file. You can even create a file that has a size of zero, using e.g. touch with a filename that doesn't already exist. But a directory is special: I suspect the fixed n*4K size is for efficiency reasons. – jamesqf – 2019-05-23T01:57:25.620