6

A provider(data center) recommended I go with 1TB SSDs in a software RAID 1 over HW RAID 10 with mechanical drives.

Their quote:

Typically SSDs are most reliable than RAID cards and since you have less parts, there are less points of failure. There won't be much of a CPU load since RAID1 is extremely simple storage.

How true is that and when running virtual machines is RAID 1 SW even ideal? They say so.

Some more details: I plan to run XEN/XEN-HvM/KVM -- in other words, it will be Linux running as the HOST and I want a setup where the guests can host Windows to Linux and can compile their own kernels.

What I want to accomplish: To be able to quickly recognize a drive failure and have a replacement thrown in with little to no downtime or performance hits.

Jason
  • 3,821
  • 17
  • 65
  • 106

5 Answers5

10

It depends on the drives, the disk controller, the type of SSD, the RAID implementation, the Operating System(s) involved, the server, monitoring ability, whether you have out-of-band access to the server, etc.

Edit: you'll be on Linux + KVM.

  • Envision a drive failure of a hardware RAID solution that takes out one disk. You receive an alert and have the drive hot-swapped. Easy.

  • Imagine a software RAID SSD drive failure that goes undetected (no explicit monitoring) and requires downtime or may be more of an involved process to remediate.

  • Nothing precludes you from using SSDs with hardware RAID, correct?

But it all depends...

I would push for SSD with hardware RAID if you need SSD performance. I wouldn't necessarily want to boot off of software RAID, but that's your choice. For virtualization, you'll probably have a mix of random read/write activity. Hardware RAID's caching can be helpful. If this is a datacenter, you may not have to worry about sudden power-loss, though.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • In my experience, which admittedly is almost entirely on free software, software RAID is a *lot* easier to detect failure on than hardware RAID. The former has a highly-standardised interface (`mdadm` and friends), the latter requires a specialised chunk of code specific to each RAID card which is not usually distributed with the OS, hidden on the manufacturer's website, and often not available under any kind of friendly licence. Your mileage may vary. – MadHatter Apr 29 '14 at 13:36
  • 1
    @MadHatter Or you look at the drive LED... And the drive replacement scenarios are entirely different. – ewwhite Apr 29 '14 at 13:38
  • That works brilliantly for both hardware and software RAID, provided you're regularly within a few metres of the kit. I'm not knocking your main answer (+1 from me!), by the way, only noting that under some OSes the "*hardware RAID fails, notifies, replaced, easy; software RAID fails, you don't notice, downtime to replace*" isn't necessarily true. The replacement scenarios depend on the hot-swappability of the hardware; in either case, hardware that supports it will be your friend! – MadHatter Apr 29 '14 at 13:38
  • Let's split the difference: your experience is going to dictate what is best. If you're MadHatter, software. If you are ewwhite, hardware. What matters to the OP is who is going to do the monitoring and who is going to do the replacing. Use whatever makes those processes most reliable. – longneck Apr 29 '14 at 13:46
  • 1
    For a virtualization solution, software RAID is typically a non-starter. And that's excluding advanced volume managers/RAID solutions like ZFS. – ewwhite Apr 29 '14 at 13:47
  • +1 for " I wouldn't necessarily want to boot off of software RAID" If you don't configure the grub options just perfect you can end up with an unbootable system if the boot disk fails. – Ausmith1 Apr 29 '14 at 20:24
  • Yeah, [things like this](http://serverfault.com/questions/592137/system-wont-boot-from-degraded-raid6) can happen. – ewwhite Apr 30 '14 at 02:43
3

In RAID10 any one of your drives can fail and the array will survive, the same as RAID1. While RAID10 can survive four of the six "two drives failed at once" circumstances the main reason to use R10 with four drives instead of R1 with two is performance rather than extra reliability, and the SSDs will give you a greater performance jump.

Early SSDs had reliability issues, but most properly run tests I've seen suggest that those days are long gone and they tend to be no more likely to fail than spinning metal based drives - the overall reliability has increased and wear levelling tricks are getting very intelligent.

when running virtual machines is RAID 1 SW even ideal?

I'm assuming you are running the RAID array on the host, in which case unless you have a specific load pattern in your VMs (that would be a problem on direct physical hardware too) the difference between soft RAID and hard RAID is not going to be dependent on the use of VMs. If you are running RAID inside the VMs then you are likely to be doing something wrong (unless the VMs are for learning or testing RAID management of course).

The key advantages of hardware RAID are:

  • Potential speed boost due to multiplexed writes: software RAID1 will likely write to each drive in turn where with hardware RAID1 the OS writes just once and the hardware writes to both in parallel. In theory this can double your peak bulk transfer rate (though in reality the difference will likely be far smaller than that) but will impart little or no difference on random writes (where with spinning metal the main bottleneck is head movements and with SSDs the main bottleneck is needing to write larger blocks even for small writes, and the block clearing time if there are no blocks ready).
  • Safety through battery backup (or solid-state) cache (though this is only on high spec controllers) allowing caching to be done safely on the controller because in the even in sudden power loss situations the controller can maintain written blocks that haven't hit the drives yet and write them when power returns.
  • Hot-swap is more likely to be supported (though your DC's kit may support hot-swap more generally so it may be available for SW RAID too).

The key advantage of good software RAID (i.e. Linux's mdadm managed arrays) is:

  • Your array is never locked to a given controller (or worse, specific versions of a given controller) meaning your arrays can be moved to new kit if all the other hardware fails but they survive. I've used this to save a file server that had its motherboard die: we just transplanted the drives into a new box and everything came back up with no manual intervention (we did verify the drives against a recent backup and replace them ASAP, in case the death was a power problem that had affected but not immediately killed the drives, but this easy transplant meant we had greatly reduced downtime outside maintenance windows). This is less of an issue if your DC is well stocked with spare parts immediately to hand of course.

On SSD Reliability & Performance:

SSD over-provision space for two reasons: it leaves plenty of blocks free to be remapped if a block goes bad (traditional drives do this too) and it stops the write performance hole (except for huge write-heavy loads) even where TRIM is not used as the extra blocks can cycle through the wear levelling pool along with all the others (and the controller can pre-wipe them ready for next use at its leisure). Consumer grade drives only really under-allocate enough for the remapping use and a small amount of performance protection, so it is useful to manually under-allocate (partitioning only 200GiB of a 240GB drive for instance) which has a similar effect. See reports like this one for details on this (that report is released by a controller manufacturer but seems a general description of the matter rather then a sales pitch, you'll no doubt find manufacturer-neutral reports on the same subject if you look for them). Enterprise grade drives tend to over-provision by much larger amounts (for both the above reasons: reliability and performance).

David Spillett
  • 22,534
  • 42
  • 66
1

Speed vs Reliability imo

Most raid controllers do NOT fully support SSD's, or they only support a specific brand of ssd (see Dell perc 6xx's). Also, Friends don't let friends SR... Unless its their home gaming system.

(HW raid + ssd raid 1) vs (HW raid + physical disks raid 10)

The speed difference between SSD's (when fully supported by the raid controller) and HD's, is like comparing formatting floppy drives vs formatting usb sticks. One takes 3 min, the other takes 3 seconds. So if you need that kind of speed go with the ssd's...and make sure you have a good backup. If not, use physical disks, and have a good backup. ;-)

1

Which solution did you go with? Yes, SSDs are fast, and they give you real boost in performance if you use them for specific purpose e.g. host database server. I support a number of servers running with SSDs in Linux software RAID1. They all work OK except one. On that one server, RAID repeatedly reports disk failure for one of SSDs (randomly, not always the same disk (disk1 / disk2)). So far, I was unable to identify why. Also, consider how will host OS see these two SSDs, because there could be an issue with replacing disk (you would not be able to do hot swop). Can you hot swop disk in software raid if disk is also used for OS?

On the other hand, old school network storage with enclosure, good RAID controller and large number of disks (in RAD10) gives you peace of mind. Hot swop of failed disk is a must for production servers.

What ever you do, remember to keep regular backups to a separate hardware. It was said many times before "RAID is not replacement for backup".

Vojkan
  • 11
  • 1
0

Have you looked at ZFS on Linux?

The cloud provider Joyent uses KVM on a custom OpenSolaris kernel with ZFS underneath. You could run your Linux host with an industrial strength filesystem (ZFS) and software RAID and not have to use all SSDs for speed.

Ausmith1
  • 1,179
  • 8
  • 12