20

I am planning to purchase a server (Dell PowerEdge R740) with SSDs in RAID 10, and my priorities are write performance and data integrity. It will be running Linux. The SSDs have write caches with power loss protection.

It seems like these are my RAID options:

  • PERC H330 (no cache), software RAID (pass-through)
  • PERC H330 (no cache), hardware RAID (write-through)
  • PERC H730P (2 Gb NV cache), hardware RAID (write-through)
  • PERC H740P (8 Gb NV cache), hardware RAID (write-through)

My questions:

  • Are any of these configurations at risk for data loss or corruption on power loss?
  • Which configuration should I expect to have the best write performance?
  • Are there any other benefits to an NV cache that I haven't considered?

Related questions:

sourcenouveau
  • 489
  • 1
  • 5
  • 18
  • 1
    Counter-intuitively, hardware RAID controller setups backed by SSDs might perform with less than the expected maximum throughput when write-back caching is enabled. But I see you are only considering write-through already, so you seem to be aware of that. – the-wabbit Oct 04 '17 at 10:56

4 Answers4

16

If used with SSDs without powerloss-protected write cache, the RAID controller's NVCACHE is extemely important to obtain good performance.

However, as you are using SSDs with powerloss-protected write caches, performance should not vary much between the various options. On the other hand, there are other factors to consider:

  • with hardware RAID is often simpler to identify and replace a failed disk: the controller clearly marks the affected drive (eg: with an amber light) and replacing it is generally as simple as pull the old drive/insert the new one. With a software RAID solution, you need to enter the appropriate commands to indentify and replace the failed drive;
  • hardware RAID presents the BIOS a single volume for booting, while software RAID shows the various component devices;
  • with the right controller (ie: H730 or H740) and disks (SAS 4Kn) you can very easily enable the extended data integrity field (T10/T13);
  • hardware RAID runs an opaque, binary blob on which you have no control;
  • Linux software RAID is much more flexible than any hardware RAID I ever used.

That said, on such a setup I strongly advise you to consider using ZFS on Linux: the powerloss-protected write caches means you can go ahead without a dedicated ZIL device, and ZFS added features (compression, checksumming, etc) can be very useful.

To directly reply to your questions:

  1. Are any of these configurations at risk for data loss or corruption on power loss? No: as any caches is protected, you are should not corrupt any data on power losses.
  2. Which configuration should I expect to have the best write performance? The H740P configured in write-back cache mode should give you the absolute maximum write performance. However in some circumstances, depending on your specific workload, write-through can be faster. DELL (and LSI) controller even have some specific SSD features (ie: CTIO and FastPath) which build on write-through and can increase your random write performance.
  3. Are there any other benefits to an NV cache that I haven't considered? Yes: a controller with a proper NVCACHE will never let the two RAID1/10 legs to have different data. In some circumstances, Linux software RAID is prone to (harmless) RAID1 mismatches. ZFS does not suffer from that problem.
shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • 2
    ZFS is more than a RAID really: it has variable parity strips so there's no ready-modify-write or "write hole". Also instead of a page cache it has advanced ARC. There's one thing it misses: NV RAM... which can be solved with NV DIMM integration :) – BaronSamedi1958 Oct 04 '17 at 09:55
13

Q1: Are any of these configurations at risk for data loss or corruption on power loss?

A1: You shouldn't have any issues, unless you'll configure cache in write-back mode, and w/out NV RAM.

Q2: Which configuration should I expect to have the best write performance?

A2: One having biggest amount of cache obviously! ...and no parity RAID, but RAID10 of course.

Q3: Are there any other benefits to an NV cache that I haven't considered?

A3: Write coalescing, spoofing etc. But these are minor really.

Marco
  • 1,679
  • 3
  • 17
  • 31
BaronSamedi1958
  • 12,510
  • 1
  • 20
  • 46
  • I wasn't sure whether the cache would help because I read that write reordering doesn't impact SSDs much, and because the SSDs have their own write caches. – sourcenouveau Oct 03 '17 at 20:44
  • @M.Dudley yes, they have caches, but you cannot have such a thing as _too much cache_. Cache is good, the more cache the better. – ThoriumBR Oct 03 '17 at 20:52
  • 7
    @M. Dudley: RAID controller has gigabytes of cache sitting behind comparably fast and low latency PCIe x4-x8 lanes bus, while SSD caches are in megabytes and they are behind 6-12 Gbps SATA/SAS links. – BaronSamedi1958 Oct 03 '17 at 21:20
  • @BaronSamedi1958 it does not matter as much as it might seem it would. "Gigabytes of cache" is spread over the entire logical volume you've defined, so broken down to a single disk it might come down to merely a few megabytes per disk. Also, even the dated Samsung 850 Pro came with 1 GB of DRAM cache, just about half the entire cache of the H730P. Last but not least: the SAS3 interface delivers 12GB/s over a single link, outperforming the x8 PCIe 3 lanes the RAID controllers are typically plugged into. – the-wabbit Oct 04 '17 at 10:52
  • 2
    @the-wabbit while I generally agree with you, your bandwidth calculation is wrong: SAS3 has 12 Gb/s or 1.5 GB/s per-direction maximum. A PCI-E 8x has 128 Gb/s or 16 GB/s per-direction maximum bandwidth. Moreover, the SAS controller itself generally hangs from an upstream PCI-E link, just as the RAID controller. – shodanshok Oct 04 '17 at 11:20
  • @the-wabbit You're confusing gigabits and gigabytes. Also you ignore the fact per-volume cache is closely related to the file system while per-disk cache is not: it's a write buffer. Same is true about SSDs cache which is actually a log-structured page assembly write buffer. + you ignore latency at every stage. – BaronSamedi1958 Oct 04 '17 at 12:48
0

You might want to use Bonnie++ to do tests on server raid cards/Perc vs SSD. the HDD speeds 5k/10/16k rpm or hybrid drives will vary the stats and cache use.

another advocate for ZFS.. I started using SGI's servers in the mid 90's and ZFS knocked the spots off anything UFS/ext2/3 related.. its bombproof.

Munkeh72
  • 319
  • 2
  • 5
0

In addition to the good answers above: an item often forgotten but required for the extended integrity of any RAID is data scrubbing aka media patrol or read patrol. This makes sure that all data on all disks is readable over an extended time.

Without scrubbing it is possible - and after an extended period of time and a large number of sectors even probable - that data sectors that haven't been used for a very long time are not readable any more. In normal operational mode this isn't a problem as the bad sector can be reconstructed using redundancy data. However, if a disk fails you've already lost redundancy (except for RAID 6 or nested RAID levels) and when a bad sector surfaces during rebuild you're dead in the water.

So, always enable data scrubbing unless you like unpleasant surprises.

Zac67
  • 8,639
  • 2
  • 10
  • 28