Apparent size of Sparse files

3

1

I created a Sparse-file of 8 GB using

dd if=/dev/zero of=/sparse-file bs=1 count=0 seek=8G

Then I did

echo "test" >> /sparse-file

I see that du -sh sparse-file gives 16K and du -sh --apparent-size sparse-file shows 8.1G.

I had a thought that if I write data to the file it will overwrite the zeroes in sparse-file but it actually grows. Why is it like that? If I start filling 8GB of real data then the apparent size will become 16GB?

What exactly does "count" do here?

Manny

Posted 2012-05-08T06:10:32.013

Reputation: 205

Answers

3

"Sparse" files are files with empty gaps presumed to be filled with zeroes, but for which that assumption is enough to go on. That is, If you read the file, you'll get zeroes, but since we know it's zeros, we don't actually have to write out 8GB worth of zeroes. It's enough to say, "let's just agree that there's a big file here without actually allocating space for it".

As you overwrite the blank contents of the file, blocks are allocated on disk to accommodate what you're storing (since you can no longer assume it's just zeroes). But if you append to the file, you're not overwriting anything. You're just adding more to the end. So by appending you allocate blocks on disk, but those blocks don't take the place of your existing "imaginary" blocks; instead the ones you create are added to the end, after the imaginary ones.

You can even add more imaginary blocks using a similar dd -skip operation like you used to create the file. The "imaginary" blocks needn't be all together. And in fact as you overwrite existing blocks within the file, only the blocks you overwrite will be allocated, no matter where they appear in the file. That is to say, writing a block at position 101 won't automatically allocate and zero-fill blocks 1 through 99.

tylerl

Posted 2012-05-08T06:10:32.013

Reputation: 2 064

0

First of all, with count=0 you only copied zero blocks from /dev/zero. So, nothing. seek=8G skipped 8GB of blocks. I would rather do that with count=1, but I guess you can leave it out entirely.

For the rest, just read man du. The apparent-size option prints apparent size rather than real disk usage. So while the first command reported that your file requires 16K on the disk, the apparent size is 8.1GB.

A file with less than 16K content can still require 16K on the disk, depending on the underlying file system used. See the GNU Coreutils description:

For example, a file containing the word ‘zoo’ with no newline would, of course, have an apparent size of 3. Such a small file may require anywhere from 0 to 16 KiB or more of disk space, depending on the type and configuration of the file system on which the file resides. However, a sparse file created with this command:

dd bs=1 seek=2GiB if=/dev/null of=big

has an apparent size of 2 GiB, yet on most modern systems, it actually uses almost no disk space.

slhck

Posted 2012-05-08T06:10:32.013

Reputation: 182 472