1

I have a peculiar load on a machine that is limited by disk IO, mostly reads.

The bulk of the IO happens on slow network attached disk that are formated with ZFS.

Using iostat I can clearly see that the use of those disk is at around 100%, hence, at least I know that this is the bottleneck.

Moreover, I see that on those disk I mostly do reads.

The slow disk contains around 3T.

I was optimistic and I installed a L2ARC cache and using zpool iostat I see something like this:

pool                           alloc   free   read  write   read  write
-----------------------------  -----  -----  -----  -----  -----  -----
  virtio-993974c9-d6be-412d-9  3,02T  1,85T     13      0  95,9K      0
cache                              -      -      -      -      -      -
  /root/cache.l2arc            12,5G  2,47G     15      2   152K   116K
-----------------------------  -----  -----  -----  -----  -----  -----
sam. févr.  9 19:48:58 CET 2019
                                 capacity     operations     bandwidth 
pool                           alloc   free   read  write   read  write
-----------------------------  -----  -----  -----  -----  -----  -----
  virtio-993974c9-d6be-412d-9  3,02T  1,85T     18      0   104K      0
cache                              -      -      -      -      -      -
  /root/cache.l2arc            12,5G  2,47G     19      0   176K      0
-----------------------------  -----  -----  -----  -----  -----  -----
sam. févr.  9 19:48:59 CET 2019
                                 capacity     operations     bandwidth 
pool                           alloc   free   read  write   read  write
-----------------------------  -----  -----  -----  -----  -----  -----
  virtio-993974c9-d6be-412d-9  3,02T  1,85T     23    308   152K  7,42M
cache                              -      -      -      -      -      -
  /root/cache.l2arc            12,5G  2,47G     31      3   276K   204K
-----------------------------  -----  -----  -----  -----  -----  -----

So the L2 is helping, but not much. Moreover I see that the L2 is not completely full, hence a bigger one won't help, right?

I have a reasonable cache rate, 98.5% for ARC and 73.7% for the L2.

Moreover, the slow disk is still used at roughly ~100%, so if I could remove work from the disk my application would run faster.

Is there any way to know what I should try next? What can help?

Siscia
  • 173
  • 3

1 Answers1

0

A larger L2ARC might very well fill to roughly the same capacity as the one you have now (which is pretty tiny, if I'm reading that right - only 12.5GB?)

L2ARC fills from ARC evictions, and it's entirely possible that what's happening is simply that the contents of the L2ARC and ARC are most of your truly "hot" dataset and you don't have a lot of repetitive reads that aren't already in cache.

As usual with L2ARC discussions, you're likely to be better served simply adding more RAM to the server for use with ARC. How much RAM do you have? How much of that RAM is available for use as ARC? (See /etc/modprobe.d/zfs.conf; it defaults to 50% of physical RAM in the system.)

Jim Salter
  • 677
  • 5
  • 11