I have a peculiar load on a machine that is limited by disk IO, mostly reads.
The bulk of the IO happens on slow network attached disk that are formated with ZFS.
Using iostat I can clearly see that the use of those disk is at around 100%, hence, at least I know that this is the bottleneck.
Moreover, I see that on those disk I mostly do reads.
The slow disk contains around 3T.
I was optimistic and I installed a L2ARC cache and using zpool iostat
I see something like this:
pool alloc free read write read write
----------------------------- ----- ----- ----- ----- ----- -----
virtio-993974c9-d6be-412d-9 3,02T 1,85T 13 0 95,9K 0
cache - - - - - -
/root/cache.l2arc 12,5G 2,47G 15 2 152K 116K
----------------------------- ----- ----- ----- ----- ----- -----
sam. févr. 9 19:48:58 CET 2019
capacity operations bandwidth
pool alloc free read write read write
----------------------------- ----- ----- ----- ----- ----- -----
virtio-993974c9-d6be-412d-9 3,02T 1,85T 18 0 104K 0
cache - - - - - -
/root/cache.l2arc 12,5G 2,47G 19 0 176K 0
----------------------------- ----- ----- ----- ----- ----- -----
sam. févr. 9 19:48:59 CET 2019
capacity operations bandwidth
pool alloc free read write read write
----------------------------- ----- ----- ----- ----- ----- -----
virtio-993974c9-d6be-412d-9 3,02T 1,85T 23 308 152K 7,42M
cache - - - - - -
/root/cache.l2arc 12,5G 2,47G 31 3 276K 204K
----------------------------- ----- ----- ----- ----- ----- -----
So the L2 is helping, but not much. Moreover I see that the L2 is not completely full, hence a bigger one won't help, right?
I have a reasonable cache rate, 98.5% for ARC and 73.7% for the L2.
Moreover, the slow disk is still used at roughly ~100%, so if I could remove work from the disk my application would run faster.
Is there any way to know what I should try next? What can help?