2

I am planning to set up a RAID array for scratch space use in a computational server (16 cores, 128 GB RAM). The users will routinely be creating large (500GB) MySQL InnoDB databases and storing these temporarily to the scratch space. The databases are filled with data from a cluster which may have up to 1000 MySQL clients connected with a database at once. The RAID controller is a PERC H710 integrated controller with a 512MB non-volatile cache.

Since the storage is temporary, I am planning to use RAID 0 for read/write performance. The remaining question is then whether to use 8 x 7,200 RPM disks or 4 x 15,000 RPM disks. One typical use-pattern will be that once a database is created, there will be very few writes to it. There will be a lot of reads for analysis so the 15K seek time would help here however I do not know how the RPM improvement stacks up against from the RAID 0 striping speed up with extra disks.

Ignoring drive capacity as a factor, which setup would be preferable, 8 x 7200 RPM drives or 4 x 15000 RPM drives? I apologize if this type of question does not have a clear answer.

Edit: I have not looked into how much the RAID controller will limit the effective throughput based on the number of disks in the array yet.

  • 8x7200 will have roughly [the same average rotational latancy](http://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics#Rotational_latency) as 4x15k. However rotational latency isn't the only variable in seek time, plus there's a whole bunch of other factors affecting overall performance. –  Mar 22 '14 at 19:18
  • With mechanical drives the motto is: Spindles, spindles, spindles to get more performance. Your controller won't be the bottleneck here. I agree with @Jack Douglas that in this case, the performance will be approximately on par. – vanthome Mar 22 '14 at 19:28
  • 3
    This site is for professionals, we'd never recommend R0 or putting DBs on 7.2k disks, if you want to play fast and loose with your customer's data then this is the place for you sorry. – Chopper3 Mar 22 '14 at 19:28
  • Thanks for the input, Jack and vanthome. Chopper3 - our use-case is for temporary storage (1-2 days), _as I said above_. Any data that gets lost due to a dead disk can be regenerated in 1-2 days. Most of the generated data is not useful - any useful data we generate gets safely backed up in triplicate. Our file servers are "professionally" managed (RAID 6 with hot spares, offsite backups + off-state backups). A solution should always fit the problem. Besides, based on the very useful remarks I did get, this does indeed seem to be the place for me to ask this question /end snark. – billyshaneguy Mar 24 '14 at 22:34

2 Answers2

6

Lots to address here, spanning design to just knowing pricing and the attributes of the related technologies.

Let's assume the reason you're choosing between 8 x 7,200 RPM nearline disks and 4 x 15k enterprise disks is cost. Let's also assume that you're talking about 2.5" small form-factor disks...

I rarely buy 15k disks these days because if latency and random I/O performance is paramount, I go to SSD-based solutions. Your capacity needs aren't tremendous, so just use 6 or 8 10k RPM enterprise disks. They have a better performance and capacity profile than the 7,200 RPM disks and are a better value than the 15k enterprise disks. Right now, 600GB and 900GB 10k SAS 2.5" disks are around the same price as 1TB 7,200 2.5" drives.

How much usable storage space do you actually need? In the 2.5" disk world, capacities are:

  • 7,200 RPM - 500GB, 1TB
  • 10,000 RPM - 72GB, 146GB, 300GB, 450GB, 600GB, 900GB, 1.2TB
  • 15,000 RPM - 72GB, 146GB, 300GB

But there's the academic side of this question. If the read/write profile is sequential, the 8 x 7,200 RPM drives win on throughput because of spindle count. If it's random, it's more complicated. The edge would still go towards the 8 slower disks, but not by much.

If your working set of data fits within 1 Terabyte and is definitely scratch space, I'd just get a 960GB PCIe SSD (or two) and be done.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • We would love to go the SSD route but we are locked into dealing with a provider who does not sell affordable SSD drives (see the reply to symcbean). I will try to see whether I can negotiate with them. Otherwise, thanks for your answer regarding the 8 disk array and suggestion for the 10K disks. That has given us some more options to consider. – billyshaneguy Mar 24 '14 at 22:23
  • What exactly is the vendor constraint? Does the vendor at least offer 10k disks? – ewwhite Mar 24 '14 at 22:27
1

Since the storage is temporary, I am planning to use RAID 0 for read/write performance

You are wrong.

Mirroring isn't just about availability. It's also about reducing latency. If you're only doing sequential access on a single table then mirroring is just going to slow down the writes. But with multiple users and/or multiples tables/indexes and/or random reads then mirroring will improve performance.

If performance is the primary objective here then, like ewwhite says, why aren't you looking at SSDs?

There's more to the story than rotation speed and capacity. For a long time the vendors of "Enterprise" drives have justified a price differential based on reliability as well as performance. But there's a growing body of evidence that this is not the case. On the other hand they do tend to behave better in failure modes - a commodity drive will try very hard to commit stuff to the disk - which can play havoc with your MTTR. Hence using enterprise drives in an array can give better availability for the array as a whole.

See also:

The price differential has to be a consideration. IME, Enterprise drives are around 4 times the cost of basic drives but typically only offer double the performance.

Since you don't seem to be bothered about availability, then I'd recommend going with the cheaper drives - but do mirror them for performance.

symcbean
  • 19,931
  • 1
  • 29
  • 49
  • We would certainly have multiple processes (1000+) writing at once so the information about the mirroring is very useful. The main reason for not looking at SSDs was the price point - our budget is @$1000 USD which is perfect for ewwhite's suggested drive but we can only buy from approved vendors and they charge $2,625 USD for a 400 GB SSD. It looks like we are stuck with SAS for the moment. Many thanks for your help. I would vote up but I do not have the rep yet. – billyshaneguy Mar 24 '14 at 22:13