Why does my RAID10 SSD array performance plummet when I am performing lots of updates on meta data?

0

Hardware RAID10 Setup

Hardware:

  • LSI 9211-8i
  • Firmware Revision: 20.00.07.00-IR
  • NVDATA Version: 14.01.00.09
  • Avago ROM Utility Version: 7.39.02.00 (2015.08.03)
  • LSI SAS2008 Controller
  • 2x Mini SAS to SATA Cable (SFF-8087)
  • No BBU or onboard DRAM cache

Layout and setup:

  • 8x Samsung 850 EVO 512GB SSD
  • Single 2TB NTFS partition
  • 83% full

Note that this array and its partition is several years old and has not experience a significant number of writes (relative to the total size of the disk) throughout its lifetime. This is a secondary partition and does not contain the OS installation, swap file, or System Restore.

Test Environment

My test environment is a freshly installed Windows 10 Pro x64 (1909) and is fully updated. The system is running on a Ryzen 1800X with 32GB of ECC memory. Windows is installed on a Samsung 850 Pro 128GB SSD. The virtual memory page file is configured on its own separate 256GB SLC NVMe SSD. Windows "Real-time Protection" was disabled during testing.

Problem Summary

My use case is that I need to update Windows security information on all of these files; this requires that every file have its Windows-specific NTFS ACLs be examined and updated. This should be fast as it only has to operate on the meta data.

The problem I'm experiencing is a significant degradation in performance when attempting this operation on many (millions of) files. Speed is excellent at first but then completely tanks. At first, files are updated very rapidly, as the progress dialog, but will eventually begin to halt for several seconds, start back up again, and then repeat this behavior in a cycle.

Use Case

My particular use case involves a lot of random reads and writes but theoretically only on a very small amount of meta data. This should be a task that an array of SSDs would excel at. The fact that it is in a RAID10 should further enhance read performance with small writes handled by on onboard DRAM cache.

I can replicate and monitor this drop in performance by performing simple benchmark with HD Tune Pro prior to and during the updating of security ACLs. Note that performance recovers immediately after the security ACLs operation has completely (benchmarks before and after are identical to the first example below).

Why am I using hardware RAID?

For those of you who might be wondering, Windows 10 Pro only supports software RAID10 with their Storage Spaces manager. I had a very traumatic experience several years ago which caused me to revert to hardware raid. Ironically, the trigger for the failure I had experienced at the time was during the very same type of operation: updating file security ACLs.

I have been very well served on Linux with both mdadm RAID using traditional file systems as well as ZFS. I'm not sure what to make of Storage Spaces on Windows but I still haven't recovered from my last experience.

Question

What factors might be affecting my seemingly ideal use case?

Please include any test cases to rule out any of the suggested factors if applicable.


Benchmarks (before/during very large ACLs update):

Before:

enter image description here

During:

enter image description here

Zhro

Posted 2020-01-31T23:33:07.507

Reputation: 471

No answers