In my ZFS box, why is my cache devices leaving a bunch of free space?

Question

I have a ZFS system serving many VMs. In it, we installed 12x4TB SAS disks and configured them in a mirror. We added two PCI-E SSDs (Samsung 960s with 512GB each), but they are each consumed to just over 50% rather than caching all the data they can. We have plenty of consumed storage, yet the SSDs are only filling up to about half way.

System is a CentOS 7 box with ZFS on Linux version 0.6.5.9-1.el7_3.centos.

Here is a snapshot of zpool iostat:

                                                      capacity     operations    bandwidth
pool                                               alloc   free   read  write   read  write
-------------------------------------------------  -----  -----  -----  -----  -----  -----
stgpool                                            13.3T  8.46T     26   1020   482K  10.6M
  mirror                                           2.21T  1.41T      4    165  50.5K  1.79M
    S1D0                                               -      -      3    158  52.0K  2.05M
    S2D0                                               -      -      0    158  11.2K  2.05M
  mirror                                           2.21T  1.41T      2    184  51.7K  1.97M
    S3D0                                               -      -      2    177  47.2K  2.26M
    S4D0                                               -      -      0    177  13.6K  2.26M
  mirror                                           2.21T  1.41T      3    170   104K  1.66M
    S1D1                                               -      -      2    169  90.4K  1.94M
    S2D1                                               -      -      1    169  21.6K  1.94M
  mirror                                           2.21T  1.41T      5    142   129K  1.61M
    S3D1                                               -      -      3    139   109K  1.86M
    S4D1                                               -      -      1    139  32.8K  1.86M
  mirror                                           2.21T  1.41T      4    170  93.7K  1.81M
    S1D2                                               -      -      4    162  95.2K  2.11M
    S2D2                                               -      -      0    162  11.2K  2.11M
  mirror                                           2.21T  1.41T      4    186  54.2K  1.78M
    S3D2                                               -      -      3    175  40.0K  2.10M
    S4D2                                               -      -      0    175  27.2K  2.10M
logs                                                   -      -      -      -      -      -
  ata-Samsung_SSD_850_EVO_250GB_S21NNXAG918721R        0   232G      0      0      0      0
  ata-Samsung_SSD_850_EVO_250GB_S21NNXAGA59337A        0   232G      0      0      0      0
  ata-Samsung_SSD_850_EVO_250GB_S21NNXAGA69590F        0   232G      0      0      0      0
cache                                                  -      -      -      -      -      -
  nvme-Samsung_SSD_960_PRO_512GB_S3EWNCAHC01880D    266G   211G     38     81   556K  2.85M
  nvme-Samsung_SSD_960__PRO_512GB_S3EWNCAHB04288W   266G   211G     37     56   416K  2.10M
-------------------------------------------------  -----  -----  -----  -----  -----  -----

Zpool settings:

NAME     PROPERTY                    VALUE                       SOURCE
stgpool  size                        21.8T                       -
stgpool  capacity                    61%                         -
stgpool  altroot                     -                           default
stgpool  health                      ONLINE                      -
stgpool  guid                        1784205276891874933         default
stgpool  version                     -                           default
stgpool  bootfs                      -                           default
stgpool  delegation                  on                          default
stgpool  autoreplace                 off                         default
stgpool  cachefile                   -                           default
stgpool  failmode                    wait                        default
stgpool  listsnapshots               off                         default
stgpool  autoexpand                  off                         default
stgpool  dedupditto                  0                           default
stgpool  dedupratio                  1.00x                       -
stgpool  free                        8.46T                       -
stgpool  allocated                   13.3T                       -
stgpool  readonly                    off                         -
stgpool  ashift                      0                           default
stgpool  comment                     -                           default
stgpool  expandsize                  -                           -
stgpool  freeing                     0                           default
stgpool  fragmentation               58%                         -
stgpool  leaked                      0                           default
stgpool  feature@async_destroy       enabled                     local
stgpool  feature@empty_bpobj         active                      local
stgpool  feature@lz4_compress        active                      local
stgpool  feature@spacemap_histogram  active                      local
stgpool  feature@enabled_txg         active                      local
stgpool  feature@hole_birth          active                      local
stgpool  feature@extensible_dataset  enabled                     local
stgpool  feature@embedded_data       active                      local
stgpool  feature@bookmarks           enabled                     local
stgpool  feature@filesystem_limits   enabled                     local
stgpool  feature@large_blocks        enabled                     local

arcstat.py info:

    time  read  l2read  hit%  hits  miss%  miss  l2hit%  l2miss%  arcsz     c  l2size  
01:44:35  2.1K     268    87  1.8K     12   268       2       97   125G  125G    778G  
01:44:36  6.5K     583    91  6.0K      8   583       1       98   125G  125G    778G  
01:44:37  1.1K     277    75   835     24   277       6       93   125G  125G    778G  
01:44:38  1.5K     230    84  1.3K     15   230       1       98   125G  125G    778G  
01:44:39  1.8K     141    91  1.6K      8   141       2       97   125G  125G    778G  
01:44:40  1.4K     203    85  1.2K     14   203      19       80   125G  125G    778G  
01:44:41  4.0K     291    92  3.7K      7   291      11       88   125G  125G    778G  
01:44:42  1.7K      95    94  1.6K      5    95       8       91   125G  125G    778G  
01:44:43  1.1K      84    92  1.0K      7    84       2       97   125G  125G    778G

I am trying to maximize my read rates and consume the 960's capacity. Any help is appreciated :)

Please post `zpool status -v` and explain where the VMs are stored. On local disks? Do you have anything connecting to this server over NFS or iSCSI? — ewwhite, Jan 07 '18 at 22:05
Uptime: 01:40:56 up 433 days, 7:05, 1 user, load average: 2.28, 1.95, 1.91 — user1955162, Apr 22 '18 at 08:41

score 1 · Answer 1 · answered Jan 07 '18 at 22:20

Familiarize yourself with arcstat and how ZFS caching works... Something like:

arcstat.py -f "time,read,l2read,hit%,hits,miss%,miss,l2hit%,l2miss%,arcsz,c,l2size" 1'

This will show the hit rate of the ZFS ARC and L2ARC caches. You may have a lot of data written to disk and a high hit-rate, but perhaps only because a small subset of data is being read. The dataset below is 6.5TB used, but only has 127GB of data in the L2ARC cache.

The ZIL devices (SLOG) is probably unused in your setup. ZFS uses the SLOG devices for Synchronous Write activity. Some databases will do this, as well as NFS traffic. If the VMs live and run from this server, the SLOG devices will never be used, so those may be a waste in this build.

I have been looking at this data, and I have a high l2miss rate. I have updated my question to reflect some data from arcstat.py. Thanks! — user1955162, Apr 22 '18 at 08:44

In my ZFS box, why is my cache devices leaving a bunch of free space?

1 Answers1