9

We have a 100G ZVOL on a FreeBSD 10.0-CURRENT host which claims to use 176G of disk space:

root@storage01:~ # zfs get all zroot/DATA/vtest
NAME              PROPERTY              VALUE                  SOURCE
zroot/DATA/vtest  type                  volume                 -
zroot/DATA/vtest  creation              Fri May 24 20:44 2013  -
zroot/DATA/vtest  used                  176G                   -
zroot/DATA/vtest  available             10.4T                  -
zroot/DATA/vtest  referenced            176G                   -
zroot/DATA/vtest  compressratio         1.00x                  -
zroot/DATA/vtest  reservation           none                   default
zroot/DATA/vtest  volsize               100G                   local
zroot/DATA/vtest  volblocksize          8K                     -
zroot/DATA/vtest  checksum              fletcher4              inherited from zroot
zroot/DATA/vtest  compression           off                    default
zroot/DATA/vtest  readonly              off                    default
zroot/DATA/vtest  copies                1                      default
zroot/DATA/vtest  refreservation        none                   local
zroot/DATA/vtest  primarycache          all                    default
zroot/DATA/vtest  secondarycache        all                    default
zroot/DATA/vtest  usedbysnapshots       0                      -
zroot/DATA/vtest  usedbydataset         176G                   -
zroot/DATA/vtest  usedbychildren        0                      -
zroot/DATA/vtest  usedbyrefreservation  0                      -
zroot/DATA/vtest  logbias               latency                default
zroot/DATA/vtest  dedup                 off                    default
zroot/DATA/vtest  mlslabel                                     -
zroot/DATA/vtest  sync                  standard               default
zroot/DATA/vtest  refcompressratio      1.00x                  -
zroot/DATA/vtest  written               176G                   -
zroot/DATA/vtest  logicalused           87.2G                  -
zroot/DATA/vtest  logicalreferenced     87.2G                  -
root@storage01:~ # 

This looks like a bug, how can it consume more than its volsize if it does not have snapshots, reservations and children? Or maybe we are missing something?

Upd:

Results of zpool status -v:

root@storage01:~ # zpool status -v
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0h6m with 0 errors on Thu May 30 05:45:11 2013
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          raidz2-0     ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0
            gpt/disk2  ONLINE       0     0     0
            gpt/disk3  ONLINE       0     0     0
            gpt/disk4  ONLINE       0     0     0
            gpt/disk5  ONLINE       0     0     0
        cache
          ada0s2       ONLINE       0     0     0

errors: No known data errors
root@storage01:~ # 

Results of zpool list:

root@storage01:~ # zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zroot  16.2T   288G  16.0T     1%  1.05x  ONLINE  -
root@storage01:~ # 

Results of zfs list:

root@storage01:~ # zfs list
NAME                            USED  AVAIL  REFER  MOUNTPOINT
zroot                           237G  10.4T   288K  /
zroot/DATA                      227G  10.4T   352K  /DATA
zroot/DATA/NFS                  288K  10.4T   288K  /DATA/NFS
zroot/DATA/hv                  10.3G  10.4T   288K  /DATA/hv
zroot/DATA/hv/hv001            10.3G  10.4T   144K  -
zroot/DATA/test                 288K  10.4T   288K  /DATA/test
zroot/DATA/vimage              41.3G  10.4T   288K  /DATA/vimage
zroot/DATA/vimage/vimage_001   41.3G  10.5T  6.47G  -
zroot/DATA/vtest                176G  10.4T   176G  -
zroot/SYS                      9.78G  10.4T   288K  /SYS
zroot/SYS/ROOT                  854M  10.4T   854M  /
zroot/SYS/home                 3.67G  10.4T  3.67G  /home
zroot/SYS/tmp                   352K  10.4T   352K  /tmp
zroot/SYS/usr                  4.78G  10.4T   427M  /usr
zroot/SYS/usr/local             288K  10.4T   288K  /usr/local
zroot/SYS/usr/obj              3.50G  10.4T  3.50G  /usr/obj
zroot/SYS/usr/ports             895K  10.4T   320K  /usr/ports
zroot/SYS/usr/ports/distfiles   288K  10.4T   288K  /usr/ports/distfiles
zroot/SYS/usr/ports/packages    288K  10.4T   288K  /usr/ports/packages
zroot/SYS/usr/src               887M  10.4T   887M  /usr/src
zroot/SYS/var                   511M  10.4T  1.78M  /var
zroot/SYS/var/crash             505M  10.4T   505M  /var/crash
zroot/SYS/var/db               1.71M  10.4T  1.43M  /var/db
zroot/SYS/var/db/pkg            288K  10.4T   288K  /var/db/pkg
zroot/SYS/var/empty             288K  10.4T   288K  /var/empty
zroot/SYS/var/log               647K  10.4T   647K  /var/log
zroot/SYS/var/mail              296K  10.4T   296K  /var/mail
zroot/SYS/var/run               448K  10.4T   448K  /var/run
zroot/SYS/var/tmp               304K  10.4T   304K  /var/tmp
root@storage01:~ # 

Upd 2:

We created a number of ZVOLs with different parameters and used dd to move the content. We noticed another odd thing, disk usage was normal for ZVOLs with 16k and 128k volblocksize and it remained abnormal for a ZVOL with 8k volblocksize even after dd (so this is not a fragmentation issue):

root@storage01:~ # zfs get all zroot/DATA/vtest-3
NAME                PROPERTY              VALUE                  SOURCE
zroot/DATA/vtest-3  type                  volume                 -
zroot/DATA/vtest-3  creation              Fri May 31  7:35 2013  -
zroot/DATA/vtest-3  used                  201G                   -
zroot/DATA/vtest-3  available             10.2T                  -
zroot/DATA/vtest-3  referenced            201G                   -
zroot/DATA/vtest-3  compressratio         1.00x                  -
zroot/DATA/vtest-3  reservation           none                   default
zroot/DATA/vtest-3  volsize               100G                   local
zroot/DATA/vtest-3  volblocksize          8K                     -
zroot/DATA/vtest-3  checksum              fletcher4              inherited from zroot
zroot/DATA/vtest-3  compression           off                    default
zroot/DATA/vtest-3  readonly              off                    default
zroot/DATA/vtest-3  copies                1                      default
zroot/DATA/vtest-3  refreservation        103G                   local
zroot/DATA/vtest-3  primarycache          all                    default
zroot/DATA/vtest-3  secondarycache        all                    default
zroot/DATA/vtest-3  usedbysnapshots       0                      -
zroot/DATA/vtest-3  usedbydataset         201G                   -
zroot/DATA/vtest-3  usedbychildren        0                      -
zroot/DATA/vtest-3  usedbyrefreservation  0                      -
zroot/DATA/vtest-3  logbias               latency                default
zroot/DATA/vtest-3  dedup                 off                    default
zroot/DATA/vtest-3  mlslabel                                     -
zroot/DATA/vtest-3  sync                  standard               default
zroot/DATA/vtest-3  refcompressratio      1.00x                  -
zroot/DATA/vtest-3  written               201G                   -
zroot/DATA/vtest-3  logicalused           100G                   -
zroot/DATA/vtest-3  logicalreferenced     100G                   -
root@storage01:~ # 

and

root@storage01:~ # zfs get all zroot/DATA/vtest-16
NAME                 PROPERTY              VALUE                  SOURCE
zroot/DATA/vtest-16  type                  volume                 -
zroot/DATA/vtest-16  creation              Fri May 31  8:03 2013  -
zroot/DATA/vtest-16  used                  102G                   -
zroot/DATA/vtest-16  available             10.2T                  -
zroot/DATA/vtest-16  referenced            101G                   -
zroot/DATA/vtest-16  compressratio         1.00x                  -
zroot/DATA/vtest-16  reservation           none                   default
zroot/DATA/vtest-16  volsize               100G                   local
zroot/DATA/vtest-16  volblocksize          16K                    -
zroot/DATA/vtest-16  checksum              fletcher4              inherited from zroot
zroot/DATA/vtest-16  compression           off                    default
zroot/DATA/vtest-16  readonly              off                    default
zroot/DATA/vtest-16  copies                1                      default
zroot/DATA/vtest-16  refreservation        102G                   local
zroot/DATA/vtest-16  primarycache          all                    default
zroot/DATA/vtest-16  secondarycache        all                    default
zroot/DATA/vtest-16  usedbysnapshots       0                      -
zroot/DATA/vtest-16  usedbydataset         101G                   -
zroot/DATA/vtest-16  usedbychildren        0                      -
zroot/DATA/vtest-16  usedbyrefreservation  886M                   -
zroot/DATA/vtest-16  logbias               latency                default
zroot/DATA/vtest-16  dedup                 off                    default
zroot/DATA/vtest-16  mlslabel                                     -
zroot/DATA/vtest-16  sync                  standard               default
zroot/DATA/vtest-16  refcompressratio      1.00x                  -
zroot/DATA/vtest-16  written               101G                   -
zroot/DATA/vtest-16  logicalused           100G                   -
zroot/DATA/vtest-16  logicalreferenced     100G                   -
root@storage01:~ # 
Alex
  • 7,789
  • 4
  • 36
  • 51
  • We suspect this can be fragmentation but we don't know how to prove it – Alex May 30 '13 at 17:47
  • Could it be related to snapshots? – Steve Wills May 30 '13 at 18:14
  • No, we don't have any snapshots on this volume – Alex May 30 '13 at 18:19
  • Sad when I see compression disabled on ZFS volumes/filesystems. Anyway, can you post `zpool status -v` and `zpool list` and `zfs list`? – ewwhite May 30 '13 at 20:51
  • @ewwhite I added that info – Alex May 30 '13 at 21:19
  • 1
    From everything I can see in this, it looks like a bug. The 'used' of a zvol with a volsize of 100G should not exceed much past 100G if there's no children or reservations or the like. Perhaps it was actually a 200+ GB volsize and you changed the volsize parameter? If not, FreeBSD-10.0 is not yet a production release; file a bug with them. – Nex7 May 31 '13 at 06:45
  • I have a 200G zvol that consumes 414G of raidz space on Nexenta, and 464G of raidz2 space with ZFS on Linux after a zfs send | zfs recv. Disturbing. – Barry Kelly Nov 13 '13 at 01:58

2 Answers2

3

VOLSIZE represents that size of the volume as it will be seen by the clients, not the size of the volume as stored on the pool.

This difference may come from multiple sources:

  • space required for metadata
  • space required for storing multiple copies (the "copies" parameters)
  • "wasted space" due to padding while aligning blocks of "volblocksize" size to vdev structure; by vdev structure I mean two parameters: number of disks in raidz-N, and physical block size of the devices.

When creating a volume, zfs will estimate how much space it would need to use in order to be able to present a volume of "volsize" to its clients. You can see that difference in the vtest-16 and vtest-3 volumes (where refreservation is 102GB and volsize is 100GB). The calculation can be found in libzfs_dataset.c (zvol_volsize_to_reservation(uint64_t volsize, nvlist_t *props))

What is not taken into account by that calculation is the third source. The third source has little impact on vdevs that are created with disks that have 512 byte sectors. From my experiments (I have tested that by filling an entire zvol to check that out) it does make quite a difference when the vdev is created over newer 4K sector disks.

Another thing I found in my experiments is that having mirrors does not show differences between calculated refreservation and what ends up to be used.

These are my results when using 4K drives with volumes that have the default volblocksize (8K). The first column represents the number of disks in a vdev:

    raidz1  raidz2
3   135%    101%
4   148%    148%
5   162%    181%
6   162%    203%
7   171%    203%
8   171%    217%
9   181%    232%
10  181%    232%
11  181%    232%
12  181%    232%

These are my results when using 512 bytes sector drives and default volblocksize of 8K. The first column represents the number of disks in a vdev:

    raidz1  raidz2
3   101%    101%
4   104%    104%
5   101%    113%
6   105%    101%
7   108%    108%
8   110%    114%
9   101%    118%
10  102%    106%
11  103%    108%
12  104%    110%

My conclusions are the following:

  • Do not use 4K drives
  • If you really need to use them, create the volume using volblocksize greater than or equal to 32K; There is negligible performance impact, and negligible space usage overhead (larger block sizes require less padding to properly align).
  • Prefer stripes of mirrors for your pools; This layout has both performance benefits and less space related surprises like this one.
  • The estimation is clearly wrong for the cases outlined above, and this is a bug in zfs.
Dan Vatca
  • 145
  • 3
  • 2
    Using 4k drives in a pool with `ashift=9` is known to cause problems. This is nothing new. Changing the block size does not align the drives either. The correct solution is to create the pool with `ashift=12`. – Chris S Jul 25 '13 at 13:10
  • `ashift=12` on 4K drives doesn't resolve this; in fact, on a zpool with `ashift=12`, 5 pcs of 4K drives and a raidz, the consumed space is close to the mentioned above, for instance 7T volume consumes 11T. – drookie Feb 15 '17 at 06:32
-1

If I'm reading this right, you actually have logicalreferenced 87.6 GB of data on the volume. The 170 GB number you're looking at is how much physical space that data uses. So if you have your drives mirrored, I would expect referenced to be about 2x logicalreferenced.

longneck
  • 22,793
  • 4
  • 50
  • 84