ZFS - enable or disable disk cache?

Question

I'm setting up ZFS (through FreeNAS) with RAIDZ1 on a server with 4 x WD Red SATA HDDs (connected through a PERC H330 in HBA mode).

The server is hooked to a UPS.

For ZFS and in this setup, does it make sense to enable HD cache of each disk, or is this very dangerous despite the UPS?

shodanshok · Accepted Answer · 2019-12-16T18:15:37.807

You should definitely enable the disk cache.

The rationale is that ZFS assumes enabled disk cache and so flushes any critical writes (ie: sync write and uberblock rewrite) via appropriate and specific SATA/SAS commands (ATA FLUSH, FUAs, etc).

Leaving the disk cache enabled permits to capitalize on the write-combining capability of modern disks without impact on pool reliability.

This obviously assumes that your disks actually honor the cache flush command, which is the norm for modern (post-2006) disks. In the rare cache your disks lie about cache flushing, then you should disable it.

As additional information, I suggest you reading the zfs_nocacheflush tunable description:

ZFS uses barriers (volatile cache flush commands) to ensure data is committed to permanent media by devices. This ensures consistent on-media state for devices where caches are volatile (eg HDDs).

score 1 · Answer 2 · answered Dec 16 '19 at 11:38

You can if you want. It won't make a big difference. ZFS leverages a portion of RAM for write cache and flushes to disk periodically.

With 4 disks, this sounds like a small installation, so benchmark both and see if there's even a noticeable benefit first.

score 0 · Answer 3 · answered May 26 '22 at 14:40

I'm entitled to concur but my setup might not be optimal. My pool:

zdata                           2.82T   822G     73    412  40.0M  46.1M                         raidz1-0                      2.82T   822G     73    412  40.0M  46.1M                           wwn-0x50014ee0019b83a6          -      -     16    106  10.0M  11.5M                           wwn-0x50014ee2b3f6d328          -      -     20    102  10.0M  11.5M                           wwn-0x50014ee25ea101ef          -      -     18    105  10.0M  11.5M                           wwn-0x50014ee057084591          -      -     16     97  9.94M  11.5M                       logs                                -      -      -      -      -      -                         wwn-0x50000f0056424431-part5   132K   112M      0      0      0      0                       cache                               -      -      -      -      -      -                         wwn-0x50000f0056424431-part4  30.7G   270M      0      5  2.45K   517K                       ------------------------------  -----  -----  -----  -----  -----  -----

Rationale. This is an arch os based dedicated NAS with a Promise SATA2 controller. As the Samsung SSD with arch had still ample space I decided to use it as log and cache device and add it tothe ZFS pool. Considering the Promise is only a PCI device I expected performance increase by the logand cache on the SSD. In day to day usage I don't see an performance increase

score 0 · Answer 4 · answered May 28 '22 at 13:54

To increase random write IOPS, we should enable the non-volatile write cache(Spinning disk）.
This feature needs support from software. eg: ZFS. not for ext4 or XFS.
In ZFS on Linux/Solaris/FreeBSD, the ZFS community recommend direct connection by SAS/SATA or the scsi network(SAS/fibre channel) by IO expander.
The Hardware raid adapter will disable all write cache by default raid mode, and it could work in JBOD mode for all devices and enable the non-volatile write cache.

In Linux, the direct connect device will show these logs, it depends on the SAS/SATA device firmware of the device vendor.

[sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
or
[sdcl] Write cache: enabled, read cache: disabled, supports DPO and FUA

writeback_cache_control.txt

but it means the operating
system needs to force data out to the non-volatile storage when it performs
a data integrity operation like fsync, sync or an unmount

Forced Unit Access
-----------------

The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
filesystem and will make sure that I/O completion for this request is only
signaled after the data has been committed to non-volatile storage.

here is the Linux document in blk-flush.c

 * If the device has writeback cache and supports FUA, REQ_PREFLUSH is
 * translated to PREFLUSH but REQ_FUA is passed down directly with DATA.
 *
 * If the device has writeback cache and doesn't support FUA, REQ_PREFLUSH
 * is translated to PREFLUSH and REQ_FUA to POSTFLUSH.

before Linux 4.7 code

/**
 * blk_queue_flush - configure queue's cache flush capability
 * @q:          the request queue for the device
 * @flush:      0, REQ_FLUSH or REQ_FLUSH | REQ_FUA
 *
 * Tell block layer cache flush capability of @q.  If it supports
 * flushing, REQ_FLUSH should be set.  If it supports bypassing
 * write cache for individual writes, REQ_FUA should be set.
 */
void blk_queue_flush(struct request_queue *q, unsigned int flush)
{
        WARN_ON_ONCE(flush & ~(REQ_FLUSH | REQ_FUA));

        if (WARN_ON_ONCE(!(flush & REQ_FLUSH) && (flush & REQ_FUA)))
                flush &= ~REQ_FUA;

        q->flush_flags = flush & (REQ_FLUSH | REQ_FUA);
}

After linux 4.7

/**
 * blk_queue_write_cache - configure queue's write cache
 * @q:          the request queue for the device
 * @wc:         write back cache on or off
 * @fua:        device supports FUA writes, if true
 *
 * Tell the block layer about the write cache of @q.
 */
void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
{
        if (wc)
                blk_queue_flag_set(QUEUE_FLAG_WC, q);
        else
                blk_queue_flag_clear(QUEUE_FLAG_WC, q);
        if (fua)
                blk_queue_flag_set(QUEUE_FLAG_FUA, q);
        else
                blk_queue_flag_clear(QUEUE_FLAG_FUA, q);

        wbt_set_write_cache(q, test_bit(QUEUE_FLAG_WC, &q->queue_flags));
}
EXPORT_SYMBOL_GPL(blk_queue_write_cache);

the Openzfs on linux flush data

/*
 * 4.7 API,
 * The blk_queue_write_cache() interface has replaced blk_queue_flush()
 * interface.  However, the new interface is GPL-only thus we implement
 * our own trivial wrapper when the GPL-only version is detected.
 *
 * 2.6.36 - 4.6 API,
 * The blk_queue_flush() interface has replaced blk_queue_ordered()
 * interface.  However, while the old interface was available to all the
 * new one is GPL-only.   Thus if the GPL-only version is detected we
 * implement our own trivial helper.
 */

The software looks like enough.
If some bug in the hardware(black box), the software can't stop it.

Good luck and backup all important data.

ZFS - enable or disable disk cache?

4 Answers4