With a bit of experimentation I've found four possible solutions.
With each approach, you need to perform the steps and then continue to read more data to fill up the ZFS ARC cache and to trigger the feed from the ARC to the L2ARC. Note that if the data is already cached in memory, or if the compressed size on disk of each block is greater than 32kB, these methods won't generally do anything.
1. Set the documented kernel flag zfs_prefetch_disable
The L2ARC by default refuses to cache data that has been automatically prefetched. We can bypass this by disabling the ZFS prefetch feature. This flag is often a good idea for database workloads anyway.
echo "zfs_prefetch_disable/W0t1" | mdb -kw
..or to set it permananently, add the following to /etc/system
:
set zfs:zfs_prefetch_disable = 1
Now when files are read using dd
, they will still be eligible for the L2ARC.
Operationally, this change also improves the behaviour of reads in my testing. Normally, when ZFS detects a sequential read it balances the throughput among the data vdevs and cache vdevs instead of just reading from cache -- but this hurts performance if the cache devices are significantly lower-latency or higher-throughput than the data devices.
2. Re-write the data
As data is written to a ZFS filesystem it is cached in the ARC and (if it meets the block size criteria) is eligible to be fed into the L2ARC. It's not always easy to re-write data, but some applications and databases can do it live, e.g. through application-level file mirroring or moving of the data files.
Problems:
- Not always possible depending on the application.
- Consumes extra space if there are snapshots in use.
- (But on the bright side, the resulting files are defragmented.)
3. Unset the undocumented kernel flag l2arc_noprefetch
This is based on reading the OpenSolaris source code and is no doubt completely unsupported. Use at your own risk.
Disable the l2arc_noprefetch
flag:
echo "l2arc_noprefetch/W0" | mdb -kw
Data read into the ARC while this flag is disabled will be eligible for the L2ARC even if it's a sequential read (as long the blocks are at most 32k on disk).
Read the file from disk:
dd if=filename.bin of=/dev/null bs=1024k
Re-enable the l2arc_noprefetch
flag:
echo "l2arc_noprefetch/W1" | mdb -kw
4. Read the data randomly
I wrote a Perl script to read files in 8kB chunks pseudorandomly (based on the ordering of a Perl hash). It may also work with larger chunks but I haven't tested that yet.
#!/usr/bin/perl -W
my $BLOCK_SIZE = 8*2**10;
my $MAX_ERRS = 5;
foreach my $file (@ARGV) {
print "Reading $file...\n";
my $size;
unless($size = (stat($file))[7]) {print STDERR "Unable to stat file $file.\n"; next; }
unless(open(FILE, "<$file")) {print STDERR "Unable to open file $file.\n"; next; }
my $buf;
my %blocks;
for(my $i=0;$i<$size/$BLOCK_SIZE;$i++) { $blocks{"$i"} = 0; }
my $errs = 0;
foreach my $block (keys %blocks) {
unless(sysseek(FILE, $block*$BLOCK_SIZE, 0) && sysread(FILE, $buf, $BLOCK_SIZE)) {
print STDERR "Error reading $BLOCK_SIZE bytes from offset " . $block * $BLOCK_SIZE . "\n";
if(++$errs == $MAX_ERRS) { print STDERR "Giving up on this file.\n"; last; }
next;
}
}
close(FILE);
}
Problems:
- This takes a long time and puts a heavy workload on the disk.
Remaining issues
- The above methods will get the data into main memory, eligible for feeding into the L2ARC, but they don't trigger the feed. The only way I know to trigger writing to the L2ARC is to continue reading data to put pressure on the ARC.
- On Solaris 11.3 with SRU 1.3.9.4.0, only rarely does the L2ARC grow the full amount expected. The
evict_l2_eligible
kstat increases even when the SSD devices are under no pressure, indicating that data is being dropped. This remaining rump of uncached data has a disproportionate effect on performance.