22

We have a 12 TB RAID 6 array which is supposed to be set up as a single partition with an XFS file system. On creating the new file system, it says it has 78 GB in use, but there are no files on the drive.

[root@i00a ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         32G     0   32G   0% /dev
tmpfs            32G     0   32G   0% /dev/shm
tmpfs            32G   11M   32G   1% /run
tmpfs            32G     0   32G   0% /sys/fs/cgroup
/dev/sdb3       154G  3.9G  150G   3% /
/dev/sdb2      1014M  153M  862M  16% /boot
/dev/sdb1       599M  6.7M  593M   2% /boot/efi
/dev/sdc1       187G  1.6G  185G   1% /var
tmpfs           6.3G     0  6.3G   0% /run/user/0
/dev/sda1        11T   78G   11T   1% /export/libvirt

Did I do something wrong? Is this by design?

It looks like the file system log only takes up about 2 GB, and I can't figure out what else could be using the space.

[root@i00a ~]# xfs_info /export/libvirt/
meta-data=/dev/sda1              isize=512    agcount=11, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2929458688, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Partition information:

[root@irb00a ~]# parted /dev/sda1
GNU Parted 3.2
Using /dev/sda1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Unknown (unknown)
Disk /dev/sda1: 12.0TB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system  Flags
 1      0.00B  12.0TB  12.0TB  xfs

This is a Dell FX2 with four FC430 compute nodes and two FD332 storage nodes, running Red Hat Enterprise Linux 8 (Ootpa).

yakatz
  • 1,213
  • 3
  • 12
  • 33
  • Is it really empty? Trying to reproduce (with a 12TB image, default mkfs settings, `bsize=4096 blocks=2929687500`) the `df -h` result is `Size 11T, Used 12G`, not `78G` as per your example. `xfsdump` produces a 21KB file... ;-) – frostschutz Sep 12 '19 at 07:26
  • 2
    Ah, I notice you had `reflink=1` but the default for me was `reflink=0`. With `reflink=1`, it also says `78G` used for me, so I can reproduce it now. – frostschutz Sep 12 '19 at 07:31
  • So it seems that this is by design, but if you're sure that reflinks won't do anything for your use case, you can consider turning it off. – frostschutz Sep 12 '19 at 07:42
  • I don't know. The only thing on here will be qcow2 files for virtual machines. – yakatz Sep 12 '19 at 12:39
  • Looks like some libvirt tools support reflink, but likely isn't worth the trouble: https://stackoverflow.com/a/41968000/597234 I can probably fit a whole additional VM in the saved space. – yakatz Sep 12 '19 at 12:48

2 Answers2

25

All filesystems have an overhead for their own internal data structures. This internal information is used for the filesystem to create files and directories in future, and to keep track of where everything is allocated. This data is collectively known as "metadata". It's data "about" the data on the filesystem. The metadata is considered an overhead, as it takes up space but is not user data. This overhead is an unavoidable side effect of using any filesystem.

According to this blog post, XFS has an overhead of around 0.5% of the total disk space. (Note that this post is from 2009, but there's no reason this should have drastically changed). He got that result by testing filesystem overhead of over a dozen different filesystems using guestfish.

0.5% of your 12TB space is 60GB, so it sounds like that's pretty close to the expected usage. I suspect his number should have been slightly higher than 0.5%, but that it was rounded.

suprjami
  • 3,476
  • 20
  • 29
Moshe Katz
  • 3,053
  • 3
  • 26
  • 41
  • Would be nice to have official documentation mention this (like the [XFS_FAQ](http://xfs.org/index.php/XFS_FAQ) ) – yakatz Sep 12 '19 at 04:30
  • 9
    Worth noting that some filesystems report the full allocated size and then charge bookkeeping overhead against used space, while others subtract bookkeeping from the full size and report only file space as "used". – chrylis -cautiouslyoptimistic- Sep 12 '19 at 07:16
  • 3
    Filesystem overhead... making people ask why their hard drives don't report what's on the sticker since 1983. – J... Sep 12 '19 at 12:08
  • 3
    @J... Actually, hard drive often market there size using 1GB=1000MB instead of 1024MB. So a HD marketed at 512GB is actually 12GB smaller than the listed size. It get's even worse with TB since they use 1TB = 1000 GB = 1000 * 1000 MB. A 1TB HD is really a 976GB instead of 1024GB. A whooping 48GB lost by TB. – Justin Sep 12 '19 at 14:03
  • 4
    The difference in measuring gigabytes (base 10) vs gibibytes (base 2) doesn't show up as used space in `df`. – yakatz Sep 12 '19 at 19:22
  • @JustinLessard Wasn't so in the 90s - drives then were usually labelled in what we now call MiB. A great example of the dividing line is the Imation SuperDisk - the LS-120 "120MB" disks were 120MiB (126MB) but when they released the upgraded "240MB" disks they were 229MiB (240MB). Even back then people were forever scratching their heads when whatever filesystem ate some of that space on their new drives. – J... Sep 12 '19 at 21:04
  • 1
    @JustinLessard You forgot about the overhead at the MiB and KiB levels. A 512 GB hard drive is actually more than 32 GiB smaller than a real 512 GiB drive. And on that, a 1 TB drive is really more like 0.909 TiB when accounting for the TiB, GiB, MiB, and KiB overhead. (1*1000^4/1024^4) = 0.90949 – penguin359 Sep 13 '19 at 00:28
  • FYI, with finobt-1, rmapbt=1 and reflink=1, a 50T disk image came with 981G pre-used (about 2%). Turning off the reverse mapping feature (which I guessed would take a bit of space), reduced it to 357G starting overhead on the 50T or a bit under 1%. The toggle on the new features is 'crc=0', no-crc calculations on metadata. As metadata grows, the disk will slowdown. One feature I will miss w/o crc: finobt -- an inode that collects space freed by other inodes. This can allow a fast find of space if free space is spread out or requires more searching. (crc=0: overhead=35M) – Astara Jun 21 '20 at 10:15
6

For XFS, the empty filesystem "Size Used" as shown by df -h seems to depend a lot on which metadata features you enable at mkfs.xfs time.

Testing with an empty 12TB file:

# truncate -s 12TB xfstest.img

Default settings (on my current ArchLinux system):

# mkfs.xfs xfstest.img 
meta-data=xfstest.img            isize=512    agcount=11, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=2929687500, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount -o loop xfstest.img loop/
# df -h loop/
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0       11T   12G   11T   1% /dev/shm/loop
# umount loop/

Using reflink=1:

# mkfs.xfs -m reflink=1 -f xfstest.img
meta-data=xfstest.img            isize=512    agcount=11, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=2929687500, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount -o loop xfstest.img loop/
# df -h loop/
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0       11T   78G   11T   1% /dev/shm/loop

Using crc=0, reflink=0: (for some reason, that also turns finobt=0, sparse=0)

# mkfs.xfs -m reflink=0 -m crc=0 -f xfstest.img 
meta-data=xfstest.img            isize=256    agcount=11, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=2929687500, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount -o loop xfstest.img loop/
# df -h loop/
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0       11T   33M   11T   1% /dev/shm/loop

In short:

# df -h loop/
Filesystem      Size  Used Avail Use% Mounted on
/dev/loop0       11T   78G   11T   1% /dev/shm/loop (reflink=1, crc=1)
/dev/loop0       11T   12G   11T   1% /dev/shm/loop (reflink=0, crc=1)
/dev/loop0       11T   33M   11T   1% /dev/shm/loop (reflink=0, crc=0)

So "Used" space on a fresh 12TB filesystem is 78G, 12G or as low as 33M depending on which metadata features you enable at mkfs time.

frostschutz
  • 184
  • 3
  • RedHat 8 has `reflinks=1` by default – yakatz Oct 31 '19 at 15:21
  • Many newer features (`finobt`, `rmapbt`, `reflink`, `sparse`) of XFS depend on a newer version of the on-disk format (version 5). `crc=1` implies the new format. For details see [`mkfs.xfs(8)`](https://man.archlinux.org/man/mkfs.xfs.8). – Sebastian Schrader Mar 10 '21 at 15:42