3

We have a central storage server (PowerEdge R720) serving shared files to an HPC cluster, and it has two hardware RAID controllers attached (PERC H810, each driving 2 MD1200 enclosures filled with 7200rpm 4TB disks). As with typical HPC workload, the access pattern is expected to be highly parallel sequential reads/writes. I suppose striping files to the two arrays would give better total throughput, but the idea of software RAID 0 on top of hardware RAID sounds crazy to me.

I came up with two options:

  1. NFS on XFS on software RAID 0 on hardware RAID 6
  2. lustre on each hardware RAID 6

XFS pros: project quota.

XFS cons: NFS on XFS showed very bad metadata performance (would degrade to nearly unusable when there's large throughput on it, did I tune it wrong?).

lustre pros: significantly better metadata performance.

lustre cons(?): we have no dedicated metadata device, have to partition an array. Sounds like not a recommended practice.

We considered metadata performance, because while sequential R/W is the primary workload, we have some programs working with something like ~40k 1GB files. Managing these files interactively does require an acceptable metadata performance.

And a question at last, what stripe sizes to use on the hardware and the software?

Carl Lei
  • 234
  • 1
  • 4
  • I wouldn't do any of these things. Acceptable performance can be obtained without resorting to this. Can you provide information about the disk types, counts, operating systems and how you configured XFS? Also, are SSDs an option? – ewwhite Jun 20 '15 at 21:52
  • @ewwhite edited to answer part of your question. OS Scientific 6, XFS configuration only in stripe size & width; above "bad metadata perf" is tested on a bare hard raid-6 array. SSDs are very unlikely going to be an option. – Carl Lei Jun 21 '15 at 01:03

1 Answers1

1

We settled on this setup:

  • Hardware RAID-6 in each MD1200 enclosure
  • Two LVM LVs on the four hardware arrays, each combining the two arrays on the two cards, without striping
  • XFS on the two LVs, striping options the same as for bare hardware arrays
  • A gluster volume on the two bricks, without striping

We investigated all applications of our users, and found all of them operate on many files. Thus it is not the situation where many clients are accessing a single large file, where striping on the level of gluster is desired; instead merely distributing files randomly to the bricks is able to provide enough total throughput.

While metadata performance of this setup is worse than that of lustre (roughly half), it does not degrade when there is large throughput going on. It turned out acceptable.

Gluster configures quota on directories, plus it is a lot more easier to setup than lustre, thus requires significantly less administration efforts than lustre. In my (rough) tests, sequential throughput of this setup is on par with lustre, so we decided to trade that part in metadata performance for easier administration.

Carl Lei
  • 234
  • 1
  • 4