6

I've set up a Solaris Express 11 machine with some reasonably fast HDDs behind a RAID controller, set the device up as a zpool with compression enabled and added a mirrored log and 2 caching devices to it. The datasets are exposed as FC targets for use with ESX and I've populated it with some data to play around with. The L2ARC partially filled up (and for some reason not filling anymore), but I hardly see any use of it. zpool iostat -v shows that not much has been read from the cache in the past:

tank           222G  1.96T    189     84   994K  1.95M
  c7t0d0s0     222G  1.96T    189     82   994K  1.91M
  mirror      49.5M  5.51G      0      2      0  33.2K
    c8t2d0p1      -      -      0      2      0  33.3K
    c8t3d0p1      -      -      0      2      0  33.3K
cache             -      -      -      -      -      -
  c11d0p2     23.5G  60.4G      2      1  33.7K   113K
  c10d0p2     23.4G  60.4G      2      1  34.2K   113K

and the L2ARC-enabled arcstat.pl script shows 100% misses for L2ARC for the current workload:

./arcstat.pl -f read,hits,miss,hit%,l2read,l2hits,l2miss,l2hit%,arcsz,l2size 5
read  hits  miss  hit%  l2read  l2hits  l2miss  l2hit%  arcsz  l2size
[...]
 243   107   136    44     136       0     136       0   886M     39G
 282   144   137    51     137       0     137       0   886M     39G
 454   239   214    52     214       0     214       0   889M     39G
[...]

I first suspected it might be an impact of the recordsize being too large so that L2ARC recognizes everything as a streaming load, but the zpool contains nothing but zfs volumes (I've created them as "sparse" using zfs create -V 500G -s <datasetname>) which do not even have a recordset parameter to change.

I also have found many notions about L2ARC needing 200 Bytes of RAM per record for its metadata, but was so far unable to find out what L2ARC would consider a "record" with a volume dataset - a single sector of 512 Bytes? May it be suffering from RAM shortage for metadata and just so far be filled up with junk that is never read again?

Edit: Adding 8 GB of RAM on top of the 2 GB alredy installed worked out nicely - the additional RAM is happily used even in a 32-bit installation and the L2ARC now has grown and is getting hit:

    time  read  hit%  l2hit%  arcsz  l2size
21:43:38   340    97      13   6.4G     95G
21:43:48   185    97      18   6.4G     95G
21:43:58   655    91       2   6.4G     95G
21:44:08   432    98      16   6.4G     95G
21:44:18   778    92       9   6.4G     95G
21:44:28   910    99      19   6.4G     95G
21:44:38  4.6K    99      18   6.4G     95G

Thanks to ewwhite.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169
  • How much RAM do you have in the system? – ewwhite Sep 12 '11 at 12:58
  • 2 gigs - not much, but all it does is handling the storage workload. No idea if more RAM would help - I would have tried, but the system uses FB-DIMMs (which I do not happen to have handy) and is running in a rack 250 km away. If you think it would, could you give some references? – the-wabbit Sep 12 '11 at 13:37

1 Answers1

7

You should have more RAM in the system. Pointers to L2ARC need to be kept in RAM (ARC), so I think you'd need around 4GB or 6GB of RAM to better utilize the ~60GB of L2ARC you have available.

This is from a recent thread on the ZFS list:

http://opensolaris.org/jive/thread.jspa?threadID=131296

L2ARC is "secondary" ARC. ZFS attempts to cache all reads in the ARC 
(Adaptive Read Cache) - should it find that it doesn't have enough space 
in the ARC (which is RAM-resident), it will evict some data over to the 
L2ARC (which in turn will simply dump the least-recently-used data when 
it runs out of space). Remember, however, every time something gets 
written to the L2ARC, a little bit of space is taken up in the ARC 
itself (a pointer to the L2ARC entry needs to be kept in ARC). So, it's 
not possible to have a giant L2ARC and tiny ARC. As a rule of thumb, I 
try not to have my L2ARC exceed my main RAM by more than 10-15x (with 
really bigMem machines, I'm a bit looser and allow 20-25x or so, but 
still...). So, if you are thinking of getting a 160GB SSD, it would be 
wise to go for at minimum 8GB of RAM. Once again, the amount of ARC 
space reserved for a L2ARC entry is fixed, and independent of the actual 
block size stored in L2ARC. The jist of this is that tiny files eat up 
a disproportionate amount of systems resources for their size (smaller 
size = larger % overhead vis-a-vis large files).
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thanks ewwhite. When stuffing more RAM into the machine, would I necessarily need a 64 bit version of Solaris to get it utilized or would ARC be able to address above 4 GB through PAE? – the-wabbit Sep 12 '11 at 19:17
  • I'm not sure on the 32-bit/64-bit as it related to ZFS. Ideally, moving to 64-bit would be beneficial, but you can get by with the 4GB of RAM to start and see if that makes a difference in L2ARC usage. – ewwhite Sep 16 '11 at 01:20
  • I've ordered additional 8 GB for the machine, I'll report with results. From what I've read so far, L2ARC will be able to use all the memory even what's above 4 GB through Solaris' PAE support. An interesting observation is that the L2ARC's LRU algorithm seems to break if there is not enough RAM to hold all the necessary entries. – the-wabbit Sep 16 '11 at 08:33
  • 1
    As a sidenote: I've reduced the space available for L2ARC to 30 GB (15 GB on each of the two drives), the L2ARC space now only fills up to 13 GB, but I am seeing significantly raised hit percentages now (around 60% overall). So apparently a lesson would be that a L2ARC set too large can get stale due to a failing LRU if there is not enough RAM in the ARC to hold the reference entries. – the-wabbit Sep 22 '11 at 12:02