Why ext filesystems don't fill entire device?

8

1

I've just noticed any of ext{2,3,4} filesystems i'm trying to create on 500G HDD don't use all available space (466G). I've also tried reiser3, xfs, jfs, btrfs and even vfat. All of them create fs of size 466G (as shown by df -h). However, ext* creates fs of 459G. Disabling reserved blocks increases space available to user, but size of fs is still 459G.

The same is for 1Tb HDD: 932G reiserfs, 917G ext4.

So, what is this 1.5% difference? Why it happens and is there the way to make ext fill whole volume?

UPD: All tests done on the same machine, on the same HDD etc. It doesn't matter how 466G differs from marketing 500G. The problem is it differs for different FS'.

About df - it shows total FS size, used size and free space. In this case I have:

for reiserfs:

/dev/sda1 466G 33M 466G 1% /mnt

for ext4:

/dev/sda1 459G 198M 435G 1% /mnt

If I turn root block reservation off, 435G changes to 459G - full size of fs (minus 198M). But fs itself is still 459G for ext4 and 466G for reiser!

UPD2: Filling volumes with real data via dd:

reiserfs:

fs:~# dd if=/dev/zero of=/mnt/1
dd: запись в «/mnt/1»: На устройстве кончилось место
975702649+0 записей считано
975702648+0 записей написано
 скопировано 499559755776 байт (500 GB), 8705,61 c, 57,4 MB/c

ext2 with blocks reservation turned off (mke2fs -m 0):

fs:~# dd if=/dev/zero of=/mnt/1
dd: запись в «/mnt/1»: На устройстве кончилось место
960356153+0 записей считано
960356152+0 записей написано
 скопировано 491702349824 байта (492 GB), 8870,01 c, 55,4 MB/c

Sorry for russian, but i've run it in default locale and repeating it is too long. It doesn't matter, dd output is obvious.

So, it turns out that mke2fs really creates smaller filesystem, than other mkfs's.

Ineu

Posted 2010-08-15T17:03:42.487

Reputation: 183

2Theres a certain amount of overhead with every FS... i dont know of one thats going to allow you to have access to all available physical space on the disk. – prodigitalson – 2010-08-15T17:07:42.143

I recommend you change your display name and put what seems to be your blog in the website field of your profile, to make it less blatantly advertising. – Hello71 – 2010-08-15T17:31:05.687

1Hello71, thanks for the advice. Website doesn't really matter, it's only for openid. – Ineu – 2010-08-15T17:35:23.743

For future note, if you quickly want a program to output in English, use LANG=C foo or LC_ALL=C foo – Alan Pearce – 2010-08-19T16:00:52.157

Alan, right, thank you. It could be even LANG= or LANG=POSIX. But as I said, this process takes lots of time so re-running it with different locale just for couple of lines is unreasonable :) In either case, it proves problem with FS size for ext2 :( – Ineu – 2010-08-20T08:02:47.887

If you're still paying attention here, I've added a whole ton more information to my answer. I suspect I have now answered your question. :-) – Omnifarious – 2012-11-07T20:50:15.317

Can you check it with fdisk /dev/sda (or cfdisk if you have it) to get an overview, the starting and ending sector of sda1 and if there's another partition or some space left? – ott-- – 2012-11-07T21:02:47.453

@ott--, unfortunately I no longer have this drive. The question is pretty old. – Ineu – 2012-11-12T19:59:22.987

Answers

19

There are two reasons this is true.

First, for some reason or another OS writers still report free space in terms of a base 2 system, and hard drive manufacturers reports free space in terms of a base 10 system. For example, an OS writer will call 1024 bytes (2^10 bytes) a kilobyte, and a hard drive manufacture would call 1000 bytes a kilobyte. This difference is pretty minor for kilobytes, but once you get up to terabytes, it's pretty significant. An OS writer will call 1099511627776 bytes (2^40 bytes) a terabyte, and a hard-drive manufacturer will call 1000000000000 bytes a terabyte.

These two different ways of talking about sizes frequently leads to a lot of confusion.

There is a spottily supported ISO prefix for binary sizes. User interfaces that are designed with the new prefix in mind will show TiB, GiB (or more generally XiB) when showing sizes with a base 2 prefix system.

Secondly, df -h reports how much space is available for your use. All filesystems have to write housekeeping information to keep track of things for you. This information takes up some of the space on your drive. Not generally very much, but some. That also accounts for some of the seeming loss you're seeing.

After you've edited your post to make it clear that none of my answers actually answer your question, I will take a stab at answering your question...

Different filesystems use different amounts of space for housekeeping information and report that space usage in different ways.

For example, ext2 divides the disk up into cylinder groups. Then it pre-allocates space in each cylinder group for inodes and free space maps. ext3 does the same thing since it's basically it's ext2 + journaling. And ext4 also does the exact same thing since it's a fairly straightforward (and almost backwards compatible) modification of ext3. And since this meta-data overhead is fixed on filesystem creation or on resize, it's not reported as 'used' space. I suspect this is also because the cylinder group meta-data is at fixed places on the disk, and so is simply implied as being used and hence not marked off or accounted for in free-space maps.

But reiserfs does not pre-allocate any metadata of any kind. It has no inode limit that's fixed on filesystem creation because it allocates all of its inodes on-the-fly like it does with data blocks. It, at most, needs some structures describing the root directory and a free space map of some sort. So it uses much less space when it has nothing in it.

But this means that reiserfs will take up more space as you add files because it will be allocating meta-data (like inodes) as well as the actual data space for the file.

I do not know exactly how jfs and btrfs track meta-data space usage. But I suspect they track it more like reiserfs does. vfat in particular has no inode concept at all. Its free space map (the size of which is fixed at filesystem create (the infamous FAT table)) stores much of the data an inode would, and the directory entry (which is dynamically allocated) stores the rest.

Omnifarious

Posted 2010-08-15T17:03:42.487

Reputation: 538

2

There's ISO standard for that: http://en.wikipedia.org/wiki/Binary_prefix

– Bobby – 2010-08-15T18:34:52.007

@Bobby - Yeah, and it's started to show up in displays. I'll add that to my answer. Thanks! – Omnifarious – 2010-08-15T21:26:50.143

8

As well as the issues that Omnifarious mentions, with ext2/3/4 a certain amount of space is reserved for root - this reserved space does not show in the output of df.

For instance creating a small filesystem (~100mb) with default options, using ext2 rather then 3 or 4 in order to ignore space that would otherwise be taken by the journal:

swann:/tmp# dd if=/dev/zero of=./loop.fs bs=10240 count=10240
swann:/tmp# mkfs.ext2 loop.fs
swann:/tmp# mkdir loop
swann:/tmp# mount -text2 -oloop loop.fs loop
swann:/tmp# df loop
Filesystem           1K-blocks      Used Available Use% Mounted on
/tmp/loop.fs             99150      1550     92480   2% /tmp/loop

Tweaking the reserved blocks option (tune2fs's -m option sets the reserved blocks as a percentage, and the -r option sets the reserved blocks as a straight number of blocks):

swann:/tmp# umount loop
swann:/tmp# tune2fs -m 25 loop.fs
swann:/tmp# mount -text2 -oloop loop.fs loop
swann:/tmp# df loop
Filesystem           1K-blocks      Used Available Use% Mounted on
/tmp/loop.fs             99150      1550     72000   3% /tmp/loop

swann:/tmp# umount loop
swann:/tmp# tune2fs -m 0 loop.fs
swann:/tmp# mount -text2 -oloop loop.fs loop
swann:/tmp# df loop
Filesystem           1K-blocks      Used Available Use% Mounted on
/tmp/loop.fs             99150      1550     97600   2% /tmp/loop

As you can see in the example above, even when logged in as root df doesn't show the reserved space in the "Available" count. The reserved space does not show in the "Used" count either, whether logged in as root or a less privileged user. This can sometimes cause confusion when a filesystem is close to full if you are not expecting these two facts.

Also note that tune2fs, despite its name, is relevant for ext3 and ext4 filesystems as well as ext2 ones.

David Spillett

Posted 2010-08-15T17:03:42.487

Reputation: 22 424

Thanks for the answer. No, it's not about reserved blocks. Updated question. – Ineu – 2010-08-15T18:22:06.860

0

About the difference between filesystems, different filesystems organize blocks diferently and need more or less data to identify and keep track of blocks. Block size also makes a difference as if you have more or less blocks for the same space, you have more or less "lost" space. Also, filesystems group blocks to avoid fragmenting files and each block cluster has an identifier of some size, so more or less block clusters will use different physical space on disk. So the difference is in how the filesystem organize the physical space.

Here is a description for ext2 and you can probably find something similar for reiserfs but I've never used it so I don't have any.

laurent

Posted 2010-08-15T17:03:42.487

Reputation: 4 166

2Reiserfs and btrfs are unusual in that almost all bookkeeping information is allocated dynamically. Only the superblock copies and free space bitmaps are allocated at filesystem setup. Of course, this means that the actual amount of space available for data is less deterministic for these filesystems. – Omnifarious – 2010-08-15T22:05:37.363

@Omnifarious +1 - So, if I understand well on reiserfs and btrfs, the available space reported is bigger at the beginning but will get used both with data and bookkeeping info instead of only data, right? – laurent – 2010-08-15T22:27:20.493

@laurent-rpnet - Yes, that is correct. In the case of btrfs it's even more interesting. btrfs can implement RAID on an individual file basis, so its reporting of available free space is even harder to pin down as it can't just assume there is going to be a certain amount of extra space used per block used for data. Additionally, it allows very cheap COW based copies, so writing a block in the middle of an existing file may allocate space. – Omnifarious – 2010-08-16T00:13:00.440

And what about XFS, JFS and VFAT? It's hard to believe such primitive fs as FAT32 is more dynamic than ext4. – Ineu – 2010-08-16T05:58:21.833

FAT32 also has blocks reserved for organization. What is the meaning of dynamic here? If dynamyc allocation, FAT32 has no dynamic allocation, like ext and also doesn't show all the blocks on disk available for data. It also have some limitations ext4 filesystem doesn't have like no permissions system while ext4 has POSIX permissions and ACLs and max file size is 4GB on FAT32 and 2TB on ext3 (not sure about ext4 but should be at least the same). – laurent – 2010-08-16T13:35:46.283

@lneu - I don't know what you mean by dynamic here either. FAT32 is very primitive. It's very vulnerable to severe data loss because it stores most of the housekeeping information in one small area on the drive. It also requires seeking to that position constantly to update that information. And it lacks a lot of features that modern filesystems possess. I, unfortunately, know almost nothing about XFS or JFS. – Omnifarious – 2010-08-16T14:24:15.610

Exactly, that's just what I'm talking about :) reiser, jfs, xfs AND EVEN fat32 all show fs being 466G sized in my case. ext2, ext3, ext4 - 459G. More available space could have been explained by the nature of dynamically allocating FS, if there was no fat32. But there is this simple, even primitive fs - and even this fs fills entire volume. ext* does not. ext creates filesystem of 459G size while volume is 466G. This is the problem. – Ineu – 2010-08-16T15:16:26.280

Available space is not real on dynamic fs. With use, it will discount from available space data AND bookeeping so available space will reduce faster than on ext4 (where only data space reduces the amount available). Difference will keep reducing (you can only compare the real total amount of data stored with the disk full). Regarding FAT32, of course a fs with less options and poor bookeeping makes more space available on the same disk. Another point, 466GB is not the full space of the disk: `500GB = 500.10^9 / 1024 = 488GB in OS language so, even fs reporting 466GB are "loosing" 22GB too. – laurent – 2010-08-16T17:16:32.900

laurent-rpnet, not really: 500.0(109)/(230) == 465.66. It would be 488 if you divide by 1024 only one time and then divide by 10*3.

About the idea to fill it with real data - thank you, will give it a try. – Ineu – 2010-08-16T19:20:06.250

@Ineu - lol true... I don't know what I was thinking about when I did this calculation! – laurent – 2010-08-17T01:56:52.023

@lneu - Yes, FAT32 is not a 'dynamic' fs in that sense. But it does use a lot less space for housekeeping. That's why it gets fragmented so easily, loses significant data with even tiny corruption, doesn't support the Unix permission model, ACLs or selinux attributes, and is otherwise a pretty poor filesystem. Percentage of disk lost to housekeeping information is a fairly poor metric for measuring a filesystem, especially now that there is generally more disk than any reasonable person can use. – Omnifarious – 2010-08-17T02:46:47.287

Updated question with tests on real data. – Ineu – 2010-08-19T15:47:53.557