2

Using ZFS to take advantage of some of the available options and to manage volumes, not the RAID. I have a single logical device (HW RAID) added to a zpool.

The ZFS ARC does not seem to perform as well as my HW RAID cache so I was trying to disable it to see if I could produce the similar results to the benchmarks run on the HW RAID device, but the performance suffers on the ZFS volumes.

I tried disabling the primarycache and secondarycache but it actually hurt performance, it didn't resolve to using the HW RAID cache as I expected. So I'm at a loss. Is it impossible to use my HW RAID cache with ZFS? Maybe primarycache and secondarycache aren't the right parameters to be modifying.

Configuration HP P410i RAID10 with Writeback BBU Cache.

Zpool with single logical device from the RAID

Created a test sparse zvol for testing device speeds (/dev/zd0)

Update to This Question The lack of performance was caused by ZFS overhead. When ZoL's ARC (primarycache) is disabled, there is extreme overhead at this time, especially on random writes. I'm not sure if this is specific to ZoL or ZFS in general. I recommend at least leaving primarycache=metadata if you are looking to reduce ARC size but maintain performance of your disks.

Devon
  • 780
  • 1
  • 9
  • 20
  • What type of hardware RAID controller do you have? Also, what OS/version/kernel and ZFS on Linux release are you using? – ewwhite Aug 21 '14 at 05:22
  • @ewwhite HP P410i. CentOS 6.5: zfs-0.6.3-1. – Devon Aug 21 '14 at 05:26
  • That's a common configuration for me. What seems to be the problem? – ewwhite Aug 21 '14 at 05:27
  • @ewwhite Well, just from standard testing I'm seeing much faster speeds from the RAID device (sda) than the zvol. The zvol seems to produce the same speeds with the RAID writeback enabled and disabled, so it doesn't seem to be leveraging the cache on the RAID card. – Devon Aug 21 '14 at 05:29
  • What is "standard testing"? Storage is comprised of many different traits and characteristics. You didn't mention zvols earlier. Post your configuration. – ewwhite Aug 21 '14 at 05:30

1 Answers1

3

I use ZFS with hardware RAID and take advantage of the HW RAID controller's flash-backed write cache (instead of a ZIL device) and leverage the ZFS ARC cache for reads.

ZFS best practices with hardware RAID

Why do you feel ZFS is not performing well? Can you share your zfs get all pool/filesystem output as well as the benchmarks you speak of? It's likely just a tuning problem.

Edit:

The defaults on ZFS on Linux are not great. You need some tuning.

Please read through the workflow I posted at: Transparent compression filesystem in conjunction with ext4

They key parts are ashift value and volblocksize for a zvol.

Also, you'll need to modify your /etc/modprobe.d/zfs.conf

Example:

# zfs.conf for an SSD-based pool and 96GB RAM
options zfs zfs_arc_max=45000000000
options zfs zfs_vdev_scrub_min_active=48
options zfs zfs_vdev_scrub_max_active=128
options zfs zfs_vdev_sync_write_min_active=64
options zfs zfs_vdev_sync_write_max_active=128
options zfs zfs_vdev_sync_read_min_active=64
options zfs zfs_vdev_sync_read_max_active=128
options zfs zfs_vdev_async_read_min_active=64
options zfs zfs_vdev_async_read_max_active=128
options zfs zfs_top_maxinflight=160

Edit:

Well, my advice would be to use lz4 compression always, use the volblocksize value of 128k, limit ARC to about 40% of RAM or less, tweak the values in the zfs.conf I posted to taste (probably reduce all values by 50% if you're using 10k SAS disks) and enable the tuned-adm framework with tuned-adm profile enterprise-storage.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • It looks like you're pretty much using the same setup as me then. I just destroyed it to do some more testing, but it was pretty much all defaults but I played around with disabling the primarycache/secondarycache and sync (ZIL) but performance actually decreased further when I did that. I was using ioping to test latency and random read performance. I was seeing about 100K IOPS on the RAID device itself, obviously leveraging the cache, but only about 60K on the ZFS device (/dev/zd0) and it didn't change when changing the RAID cache. – Devon Aug 21 '14 at 05:33
  • See my edit above. Can you tell me what you're planning to do with zvols? – ewwhite Aug 21 '14 at 05:36
  • Is ashift=12 necessary even with a single zpool device? I tried volblocksize 128K and 32K but didn't see much difference in the random read performance tests, probably because it was utilizing the ARC cache. I'm sure it makes more of a difference in typical usage scenarios, I read that 32K performs better for smaller random loads. – Devon Aug 21 '14 at 05:40
  • P.S. I had seen that post. I posted an answer to that too, discard is helpful in using ext4. However, I didn't get that far yet on this setup, have just been benchmarking the devices for now, haven't* setup the filesystems. – Devon Aug 21 '14 at 05:41
  • ZFS on Linux requires a different tuning approach than Solaris/Illumos-based ZFS. So you want the large block size if using zvols. Are you planning on another filesystem on top? If not, why do you have zvols? I would not recommend `ioping` as a testing suite. Maybe `iozone`... `ashift` values are a preference and really for forward compatibility with the drives I use on newer servers. PCIe SSDs tend to test best with `ashift=13`. – ewwhite Aug 21 '14 at 05:45
  • See edit with zfs.conf example. – ewwhite Aug 21 '14 at 05:47
  • Thanks for your help so far. It is a virtualization hypervisor, so the zvols will be used to host different filesystems later on. – Devon Aug 21 '14 at 05:49
  • @devon See my edit above. – ewwhite Aug 21 '14 at 05:54
  • Thanks. I will take a look at your suggestions. Just curious but have you done any testing between disabling and enabling the writeback cache on your HP card? It appears to me that the system is only leveraging the ARC cache, which of course is fast and beneficial on it's own but it feels like a waste to have the BBU. – Devon Aug 21 '14 at 06:00
  • See the first line of my answer. – ewwhite Aug 21 '14 at 06:01
  • ewwhite, I have done some more testing. On an older system with an SSD array for caching I am seeing severe slow downs with ZFS. I have tested with fio and ioping which both show similar results. I am showing slower speeds using ZFS than the SSD array itself. I am getting up to 100K IOPS on the SSD array but when I put ZFS over top, it is slowing down to 20K in benchmarks. Do you have any suggestions? I am using volblocksize=128K, compression didn't make much of a difference (tried LZ4, ZLE, and off). – Devon Aug 24 '14 at 01:36
  • @devon sorry to hear. Troubleshooting this is probably beyond the scope of this question. – ewwhite Aug 24 '14 at 01:48
  • @Devon Benchmarks aren't real life. Test with a realistic workload if you must. I tend not to care about *IOPS numbers only* because that's not what storage is all about. See if your service times are acceptable, if latencies are low, if your application does what it needs to. – ewwhite Aug 24 '14 at 02:18
  • @ewwwhite I've been testing around with ZFS quite a bit now. I found out a little bit of the problem. Using a volblocksize larger than 4k on an ext4 partition seems to cause high read throughput but low write IOPS because it causes a lot of read-modify-writes. Have you noticed this with your setup? For instance a volblocksize of 128K with 4k random writes causes a write overhead of ~500% compared to a 4k volblocksize. – Devon Nov 18 '14 at 19:44
  • @Devon I don't use ext4 on top of ZFS. I use XFS, though. I haven't had any abnormal signs of overhead. Are you mounting your filesystems with `nobarrier`, or if using RHEL/CentOS, using the `tuned-adm` enterprise-storage profile? – ewwhite Nov 18 '14 at 19:47
  • I'm not using nobarrier. Only noatime and defaults but I am using tuned-adm as you suggested. What is the default block size on XFS? – Devon Nov 18 '14 at 19:50
  • I just noticed in your recommendations on the other post, you have logbufs in your mount. I'll look into barriers, that could be possibly important. Is it dangerous to use nobarriers? – Devon Nov 18 '14 at 19:53
  • @Devon logbufs is for XFS. You have a battery-backed write cache on your RAID controller. It's fine to use "nobarrier". – ewwhite Nov 18 '14 at 19:54
  • in this instance yes. But I also have a couple servers I'm looking into putting ZFS on that do not have HW RAID but I do plan on having an SSD dedicated to zil. Is ZIL a sufficient replacement in this case? I appreciate your help and advise. – Devon Nov 18 '14 at 19:59
  • This is beyond the scope of this question. The answer is: *it depends*. – ewwhite Nov 18 '14 at 19:59