I am running a write-heavy application where I am storing my Postgresql on ZFS. Generally, it works well, but I am finding that my ZFS pool is fragmenting heavily. I even created a new ZFS pool and moved the data there using zfs send / recv, in order to defragment the full space and try out some new settings which might limit fragmentation, but to no avail.
I'm not sure what is going on. The application is mostly making UPDATEs, but Postgres should be able to reuse the space already on disk (as it makes dead tuples and autovacuums them, it will re-use those pages). I do see that the allocated space is mostly staying the same, so it's not increasing disk usage. However, the COW nature of ZFS seems to be making the filesystem use up the free space and fragment it heavily.
So, after about 15m of heavy writes, the ZFS pool fragmentation went from 0% (initial) to 14%. This can't be the way things are supposed to be:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
postgres_data 149G 89.0G 60.0G - - 14% 59% 1.00x ONLINE -
postgres_wal 9.50G 289M 9.22G - - 0% 2% 1.00x ONLINE -
(note: in the time that I've been writing this post, the fragmentation has gone up to 16%. There is something not right here.)
Some notes on my implementation:
- I have the WAL on a separate pool, in the theory that constant writes to the WAL won't fragment the main data store. Apparently this is not working.
- I also have tried to set it up with
sync=disabled
on the theory that the ZIL is making my disk into swiss cheese. Also not making a difference. - I currently using
logbias=throughput
. I've seen conflicting advice on this; official docs suggest that for postgres one should usethroughput
, while a very useful guide says that "logbias=throughput will fragment every. Single. Block. Written to your pool." So, in my initial pool (now no longer in use, because it fragmented up to 80%), I initially usedlogbias=latency
but that clearly also fragmented the whole disk. - I am using
async=9
because the physical disk (this is cloud storage) has a block size of 512.
My full zfs settings:
NAME PROPERTY VALUE SOURCE
postgres_data/data type filesystem -
postgres_data/data creation Tue May 17 23:36 2022 -
postgres_data/data used 88.9G -
postgres_data/data available 55.4G -
postgres_data/data referenced 88.9G -
postgres_data/data compressratio 1.98x -
postgres_data/data mounted yes -
postgres_data/data quota none default
postgres_data/data reservation none default
postgres_data/data recordsize 8K inherited from postgres_data
postgres_data/data mountpoint /var/lib/postgresql local
postgres_data/data sharenfs off default
postgres_data/data checksum on default
postgres_data/data compression zstd inherited from postgres_data
postgres_data/data atime off inherited from postgres_data
postgres_data/data devices on default
postgres_data/data exec on default
postgres_data/data setuid on default
postgres_data/data readonly off default
postgres_data/data zoned off default
postgres_data/data snapdir hidden default
postgres_data/data aclmode discard default
postgres_data/data aclinherit restricted default
postgres_data/data createtxg 619 -
postgres_data/data canmount on default
postgres_data/data xattr sa inherited from postgres_data
postgres_data/data copies 1 default
postgres_data/data version 5 -
postgres_data/data utf8only off -
postgres_data/data normalization none -
postgres_data/data casesensitivity sensitive -
postgres_data/data vscan off default
postgres_data/data nbmand off default
postgres_data/data sharesmb off default
postgres_data/data refquota none default
postgres_data/data refreservation none default
postgres_data/data guid 10551245409099064921 -
postgres_data/data primarycache all default
postgres_data/data secondarycache all default
postgres_data/data usedbysnapshots 0B -
postgres_data/data usedbydataset 88.9G -
postgres_data/data usedbychildren 0B -
postgres_data/data usedbyrefreservation 0B -
postgres_data/data logbias throughput local
postgres_data/data objsetid 595 -
postgres_data/data dedup off default
postgres_data/data mlslabel none default
postgres_data/data sync disabled local
postgres_data/data dnodesize legacy default
postgres_data/data refcompressratio 1.98x -
postgres_data/data written 88.9G -
postgres_data/data logicalused 176G -
postgres_data/data logicalreferenced 176G -
postgres_data/data volmode default default
postgres_data/data filesystem_limit none default
postgres_data/data snapshot_limit none default
postgres_data/data filesystem_count none default
postgres_data/data snapshot_count none default
postgres_data/data snapdev hidden default
postgres_data/data acltype off default
postgres_data/data context none default
postgres_data/data fscontext none default
postgres_data/data defcontext none default
postgres_data/data rootcontext none default
postgres_data/data relatime off default
postgres_data/data redundant_metadata most inherited from postgres_data
postgres_data/data overlay on default
postgres_data/data encryption off default
postgres_data/data keylocation none default
postgres_data/data keyformat none default
postgres_data/data pbkdf2iters 0 default
postgres_data/data special_small_blocks 0 default