2

My question is in the title... here is some background:

[OS is Linux.]

[UPDATE: The contents of this RAID0 is redundant (sync'd from SCM). I am not worried about the increased risk of data loss.]

[UPDATE 2: Practically, I'm likely splitting hairs here. But in addition to trying to solve a practical problem, I'm wanting to improve/confirm my understanding of the theoretical.]

I have an automated build server that I use to compile the source code of a very large project, and I am looking to minimize my build times. I figure that the best possible build times will happen when the machine stays CPU-bound for the entire duration of the build (i.e., all cores loaded at 100%, the whole time.) This is of course an idealized goal that can only be approached asymptotically.

I can tell from the behavior of the build (mostly, watching the output of mpstat) that the greatest enemy of my goal is %iowait. There are times when I see a non-negligible %idle, and I consider that a modest failure of the kernel's scheduler, and/or small inefficiencies in Make's ability to parallelize the build. But this is generally not big enough for me to worry about. On the other hand, %iowait quite frequently gets seriously oversized... and my CPU load drops dramatically. I believe this usually happens when some threads are trying to link up (write) large libraries to the (software-controlled[*]) RAID0 while other threads are trying read source code.

(Please ignore for the moment the fact that I can move the output writes to a different volume & controller than the source code. That is planned.)

I am considering switching to SSDs. But in that case, I think it is probably best to abandon the software RAIDing[*] of the drives. My intuition is: the access times of the SSDs are so quick, and the transfer times so fast, that a simple LVM of 4 SSDs will squash my %iowaits to near-nothing, and my cores will then be constantly pegged, doing the maximum amount of useful work.

... In which case, software control of 4 RAIDed SSDs would needlessly increase my %sys, leaving less for %user. My cores would still be pegged, but there would be less "useful" work getting done.

Is my intuition on software-RAID0'ing SSDs correct, for this particular goal?

[*] BONUS QUESTION: There is a RAID controller on the motherboard, but my understanding is that it is just 'fake RAID', supplying volume management functions within the BIOS option ROM, but otherwise just software RAID. So I don't use it. But would a true hardware RAID controller even be helpful here? It is clear I can peg my cores quite easily on this machine; I just can't sustain it. I believe SSDs will mostly solve that stamina problem, and I find myself wondering if even a real hardware RAID controller can improve on that.

2 Answers2

2

Software RAID under Linux on modern hardware is fine... even with SSDs. It doesn't place a tremendous demand on your CPUs. Really.

Heck, with premium Fusion-io solid-state drives, the recommended and common deployment scheme is to use software RAID.

I wouldn't worry about this at all.

Also see: Do I need to RAID Fusion-io cards?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thank you for your answer. I can certainly see using software RAID on SSDs if your goal is to maximize data throughput. Maximizing CPU work completion per unit of time is a slightly different goal, and theoretically speaking I suspect software RAID will work against it, even if negligibly. (I admit being ignorant of the overhead cost for software RAID on EXT4. I know it has to be small relative to total CPU, but possibly the same magnitude as SSD access times.) – Ryan V. Bissell Dec 12 '14 at 16:08
  • Put another way: It seems typical access times for SSD is 100us. Could the RAID part of the filesystem driver's ISR complete in substantially less time than that? For spinning metal (10ms), certainly. – Ryan V. Bissell Dec 12 '14 at 16:12
0

Even though I have accepted @ewwhite's answer above, I wanted to come back and report on a slightly conflicting answer that I just discovered elsewhere on the web, that is based on empirical data:

Our results from the testing showed a 16% increase in reads while utilizing (2) SSDs in a RAID 0, with a 2% decrease in write performance. The performance gains from the reads is substantial enough to warrant utilizing the RAID 0 for most purposes, but if you're running an application that performs more writes than reads you may benefit more from using the data disks stand-alone instead of going with the RAID 0 option.

From http://www.rackspace.com/knowledge_center/article/configuring-a-software-raid-on-a-linux-general-purpose-cloud-server

(My RAID is reading more than writing, so @ewwhite's answer is still appropriate for my needs.)