34

I am having a difficult time grasping what is the correct way to read the size of the files since each command gives you varying results. I also came across a post at http://forums.devshed.com/linux-help-33/du-and-ls-generating-inconsistent-file-sizes-42169.html which states the following;

du gives you the size of the file as it resides on the file system. ( IE will will always give you a result that is divisible by 1024 ).

ls will give you the actual size of the file.

What you are looking at is the difference between the actual size of the file and the amount of space on disk it takes. ( also called file system efficiency ).

What is the difference between as it resides on the file system and actual size of the fil

womble
  • 95,029
  • 29
  • 173
  • 228
PeanutsMonkey
  • 1,832
  • 8
  • 26
  • 27

4 Answers4

53

This is called slack space:

Each layer of abstraction on top of individual bits and bytes results in wasted space when a datafile is smaller than the smallest data unit the file system is capable of tracking. This wasted space within a sector, cluster, or block is commonly referred to as slack space, and it cannot normally be used for storage of additional data. For individual 256-byte sectors, the maximum wasted space is 255 bytes. For 64 kilobyte clusters, the maximum wasted space is 65,535 bytes.

So, if your filesystem allocates space in units of 64 KB, and you store a 3 KB file, then:

  • the file's actual size is 3 KB.
  • the file's resident size is 64 KB, as the remaining 61 KB in that unit can't be allocated to another file and is thus lost.

Note: Some filesystems support block suballocation, which helps to mitigate this issue by assigning multiple small files (or the tail ends of large files) into the same block.

Handyman5
  • 5,177
  • 25
  • 30
  • 1
    That is one damn good explanation. – SpacemanSpiff Jul 13 '11 at 20:44
  • 1
    @Handyman5 - Thanks Handyman5. So when I am looking at the size of a file or folder using ls it returns the actual size whilst du returns the resident size? Is that correct? So when looking at the size of the file, which is the most accurate i.e. resident size or file size or is that an arbitrary question? – PeanutsMonkey Jul 13 '11 at 21:19
  • 9
    @PeanutsMonkey, accuracy is in the eye of the beholder. ;-) Basically, if you're concerned about how much space the file would take up somewhere else (like copying over the network, adding to a zip file, backup to an external drive, etc.), then the actual size is what you care about. If you're concerned with the amount of space left on the drive where the file lives now, then you care about the resident size. Since `du` is showing you the `d`isk `u`sage, it's looking at the space taken up on the current drive, and thus it shows you the resident size. – Handyman5 Jul 13 '11 at 22:23
  • @Handyman5 - Thanks Handyman5. That was very helpful. – PeanutsMonkey Jul 13 '11 at 22:30
  • 1
    @Handyman5 - It's almost a year after the post but am curious how the answer above differs when using `df -h`? – PeanutsMonkey Jul 10 '12 at 19:30
  • 1
    `df` reports the number of remaining blocks * the filesystem block size. In this case it'd be more like `du`, as even partially-used blocks are considered fully allocated. `df` basically translates [statvfs](http://linux.die.net/man/2/statvfs), so you could look at that system call to get a better idea of what's going on. – Handyman5 Jul 10 '12 at 19:49
25

There's another option here, that hasn't been covered -- sparse files. In this case, du will show a smaller size than a simple ls -l would, because ls is reporting the "size" of the file as being the apparent size (the number of bytes you could read, if you wanted a whole lot of zeroes), while du will continue to use the actual number of disk blocks in use.

Fun trick: Create a great many large sparse files, then impress your friends with how much disk space you have ("look, I'm storing eleventy-gazillion 1TB files on my hard drive!"). OK, maybe not so fun then.

womble
  • 95,029
  • 29
  • 173
  • 228
6

Filesystems are made up of blocks. Files don't have to neatly fit into blocks. If a file was 1024 bytes it's size in ls and du would be 1024. If the file size was 1025 the size would be 1025 in ls and 2048 in du.

Note the example above assumes a block size of 1024. Larger block sizes are the norm these days e,g,

ls -l fred
-rw-r--r-- 1 iain users 1024 Jul 13 22:06 fred

du -h fred
8.0K    fred
user9517
  • 114,104
  • 20
  • 206
  • 289
2

There's still one more reason they may be different. du -h knows when it sees the same file under another name (hard links, as opposed to symlinks) and will report each file for the size it is, but only add the size once to the common parent directory.