Why is size on disk so large for a single file?

3

1

I've read a couple posts explaining that when you have many small files, the "size on disk" reported by windows can be much larger than the reported "size". This makes good sense to me, but from what I can tell, the "cluster size" or "allocation unit" is typically 4 kB, which (if I understand the argument right) means that a single file should not be more than 4 kB bigger in the "size on disk" metric.

I have a tiff image that is reportedly 65 kB in "size", but 1.00 MB when measured as "size on disk". What could the cause of this big discrepancy be?

Update: I realize now that the file is on a NAS drive that runs Linux. I checked the allocation unit size and it is just 4 kB, not 1 MB, for example:

bash-3.2# /sbin/blockdev --getbsz /dev/sda1

4096

I also checked a file that has "size" 1 kB and it shows up as 1.00 MB under "size on disk".

Jed

Posted 2014-09-16T08:20:08.653

Reputation: 31

2blockdev shows block size of the block device not the file system. For example if you have ext2 to ext4 you can see the file system block size using: tune2fs -l /dev/sda1 | grep -i size:. ------ How did you check "size on disk" on Linux? You can use: ls --block-size=1 -s filename or du --block-size=1 filename. – pabouk – 2014-09-16T09:38:40.903

I have done some tests between WinXP and Ubuntu, and proved that Windows gets it wrong. In my case it was rounding the size to a multiple of 1024 bytes instead of 4096, but different Windows and Linux network drivers could easily show your observation. The Linux command du --block-size 1 ... showed the expected multiple of 4096, so that is the command to believe. If you copy the file on to a Windows directory, you should not see the huge discrepancy. – AFH – 2014-09-16T12:41:32.483

Answers

-2

Data (files are binary data) on disks are saved in clusters. That's the way hard disk drives work. In this example clusters that are 4kb in length. There are options to format disks in smaller allocation units.

To be rough, think of clusters as 4kb slots that are ready to store binary data. If a file is bigger than 4kb it will take extra slots. If it is smaller, it would take exactly one slot.

For example, consider a file that is 5kb. Since it won't fit into a single cluster, an extra one will be used. Thus disk size would be 8kb, even if the file is actually 5kb in size.

Try creating a small file (less than 4kb) and see that its disk size is exactly 4kb.

ikromm

Posted 2014-09-16T08:20:08.653

Reputation: 172

2Hmmm... your answer sounds exactly like what I have seen elsewhere and makes me think that the size on disk should just be rounded up to the next 4kB size (so 65kB might become 68kB, for example). This is what I expected. But what I am seeing is quite different: 65kB is becoming 1MB. When I first posted this, I thought that I was looking on a file on my hard drive, but now I realize the file is actually located on a NAS drive that runs Linux... could this be a hint? – Jed – 2014-09-16T08:48:00.503

Consider also, that the filesystem would require to save extra data containing the map of clusters which contain the data of the file. Unfortunately though, I don't know if this data can be so lengthy. – ikromm – 2014-09-16T08:59:32.080