29

What are the pro's and con's of consumer SSDs vs. fast 10-15k spinning drives in a server environment? We cannot use enterprise SSDs in our case as they are prohibitively expensive. Here's some notes about our particular use case:

  • Hypervisor with 5-10 VM's max. No individual VM will be crazy i/o intensive.
  • Internal RAID 10, no SAN/NAS...

I know that enterprise SSDs:

  1. are rated for longer lifespans
  2. and perform more consistently over long periods

than consumer SSDs... but does that mean consumer SSDs are completely unsuitable for a server environment, or will they still perform better than fast spinning drives?

Since we're protected via RAID/backup, I'm more concerned about performance over lifespan (as long as lifespan isn't expected to be crazy low).

David Budiac
  • 515
  • 1
  • 6
  • 11
  • 1
    Please provide specifics on the makes/models of hardware involved. And operating systems... and hypervisors... Maybe even what the VMs will be doing. More details!! – ewwhite Jul 20 '15 at 23:43
  • @ewwhite Dell rack servers. Likely an R430 or R730 with a PERC H730 RAID controller. Also likely a HyperV server hosting mostly Windows Server Standard... *might* use VMware over HyperV. Still considering. Initially VMs will be: domain controller, DNS, WSUS, deployment services. May add internal web server as well. – David Budiac Jul 20 '15 at 23:54
  • 1
    And how much capacity do you require? – ewwhite Jul 21 '15 at 00:04
  • @ewwhite 2TB usable at minimum – David Budiac Jul 21 '15 at 00:14
  • http://superuser.com/questions/834521/is-there-still-a-reason-to-choose-a-10-000-rpm-hard-drive-over-an-ssd/834531#834531 worth a read. Its about 10K *consumer* drives but many of the points are still relevant here. – Journeyman Geek Jul 21 '15 at 01:01
  • We had issues with a Dell server and Samsung Pro SSDs - the RAID controller did not recognize them – Greg Jul 21 '15 at 02:29
  • In the general case, SSD's all the way. I can't wait until we reach the day where they stop making mechanical HDD's altogether. The time for that technology has passed. SSD's, even consumer-level models, outperform "fast" HDD's on all counts, typically by at least an order of magnitude. In your specific case, on your specific hardware, it's essentially just a question of compatibility (and to a lesser extent, cost). There's literally no other reason to waste money buying _new_ HDD's. – aroth Jul 21 '15 at 14:44
  • Overall SSDs offer a lot of advantages, but this article is a very interesting example of when SSDs in a server environment can lead to problems: https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/ –  Jul 22 '15 at 17:50
  • @Gugges Yikes. Though in the end, the article mentions that it was a bug in the Linux kernel, not the SSD's – David Budiac Jul 22 '15 at 21:21
  • Consider storage spaces on Windows, zfs on Linux. Performance is good compared to hardware raid and very easily recoverable from using different hardware. – Arthur Kay Aug 15 '15 at 03:46

9 Answers9

22

Note: This answer is specific to the server components described in the OP's comment.

  • Compatibility is going to dictate everything here.
  • Dell PERC array controllers are LSI devices. So anything that works on an LSI controller should be okay.
  • Your ability to monitor the health of your RAID array is paramount. Since this is Dell, ensure you have the appropriate agents, alarms and monitoring in place to report on errors from your PERC controller.
  • Don't use RAID5. We don't do that anymore in the sysadmin world.
  • Keep a cold spare handy.
  • You don't necessarily have to go to a consumer disk. There are enterprise SSD drives available at all price points. I urge people to buy SAS SSDs instead of SATA wherever possible.
  • In addition, you can probably find better pricing on the officially supported equipment as well (nobody pays retail).
  • Don't listen to voodoo about rotating SSD drives out to try to outsmart the RAID controller or its wear-leveling algorithms. The use case you've described won't have a significant impact on the life of the disks.

Also see: Are SSD drives as reliable as mechanical drives (2013)?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 2
    I've casually heard this before... not to use RAID5 anymore. Mainly because of reliability? And what do you use in it's place RAID6? RAID10? – David Budiac Jul 21 '15 at 00:07
  • 2
    RAID 1+0. See: [*Dell "RAID 5 is no longer recommended for any business critical information on any drive type*](https://www.reddit.com/r/sysadmin/comments/ydi6i/dell_raid_5_is_no_longer_recommended_for_any/) – ewwhite Jul 21 '15 at 00:10
  • @ewwhite One URE every 200M sectors? If you do a full format, that would be on average >1 URE right out of the box of a 1TB drive! – user253751 Jul 21 '15 at 08:57
  • 1
    **+1** Good answer. I really wouldn't consider buying consumer grade SSDs for use on a PERC H700/H710/H730. Just Google for "PERC H730 uncertified drives" a lot people have tried that before and ended up with problems. Atleast buy cheap entry level SSDs like the already mentioned Intel S3500. – s1lv3r Jul 21 '15 at 10:07
  • 1
    Good answer, but it would be nice if you would add some reasoning behind your suggestions (eg. that link you posted in the comments). Why compatibility is important is obvious, but why are you favoring SAS over SATA? – Sebb Jul 21 '15 at 10:08
  • 1
    @Sebb [Already wrote about it.](http://serverfault.com/questions/507521/are-ssd-drives-as-reliable-as-mechanical-drives-2013/507536#507536) – ewwhite Jul 21 '15 at 10:09
  • 1
    @s1lv3r good point. [Several](http://en.community.dell.com/support-forums/servers/f/906/t/19618791) [sources](http://lists.us.dell.com/pipermail/linux-poweredge/2015-April/049719.html) mention that while Dell allows 3rd party drives, they're not certified and OpenManage will still give warnings. Though I can't seem to find a list of certified SSD's supported by Dell or the H730 :/ – David Budiac Jul 21 '15 at 19:05
  • @DavidBudiac I don't think you'll find such a list - if it would exist, nobody would buy vendor supplied disks. ;-) Best I could find for you - some random guy on a forum, stating the S3500 works for him: ["I have confirmed that the Intel S3500 works with the PERC H730 ..."](http://community.spiceworks.com/topic/848332-lsi-controllers-in-poweredge-servers). ... - also as awwhite suggested, the card is LSI rebranded - IMHO it's an MegaRAID 9361 (LSI 3108) - so everything that works there should work with the PERC (the cards can even be cross-flashed if you are feeling brave enough). – s1lv3r Jul 21 '15 at 22:42
  • FWIW, I spoke with a Dell rep. While they don't have an official list of SSDs that are compatible w/ the H730, he did say that OpenManage will not complain about any of the drives the Dell sells on their website. I just bought a handful of Samsung 845DC Evo's. Will report back w/ their compatibility. – David Budiac Jul 28 '15 at 19:59
  • regarding SAS vs. SATA: Are you really saying you recommend enterprise SSDs (regardless of connectivity) over consumer ones? Or do you really think SAS has a critical feature? – Dan Pritts Aug 20 '15 at 18:49
  • I recommend SAS or PCIe SSDs when I can. I just don't like SATA because it doesn't fit my use cases. – ewwhite Aug 20 '15 at 18:50
8

Yes, the SSDs will be way faster than the SAS drives. For sequential throughput, a good RAID of SAS drives might do pretty well, but for random access, the SSDs will blow them out of the water which can result in a very noticeable performance difference.

Depending on the particular SAS drives and the particular SSD drives, the SSDs may have a better unrecoverable read error rate by up to a factor of 10.

Some tips for if you do use consumer SSD drives:

  • Know your write workload so you can estimate how often you'll have to replace the drives since they have a certain amount of write endurance
  • If you can spare the space, overprovision the drives to make them more like enterprise ones
  • Check out articles comparing the performance and write endurance characteristics of SSDs in the the same class and pick the one best suited to your needs
  • Personally I'd get SSDs with a 5 year warranty because I believe the manufacturer is going to provide better quality as a result. I know this isn't a hard and fast rule, just personal belief.
  • There are low end consumer SSD drives and higher end one - sometimes labeled something like "Pro" - you might want to look for ones in that class
  • This goes for enterprise drives too, but be sure you're monitoring the MWI (media wear indicator) so you know when to replace the drives
sa289
  • 1,308
  • 2
  • 17
  • 42
7

Consumer grade SSDs will work fine in many servers for use cases.

They are way, way faster then SAS disks. I'd suggest the reason to get enterprise disks over consumer disks is not the speed, its the read-write cycles and better engineering - for example supercaps are present in some enterprise SSD's where the consumer grade version does not have this - if you loose power to the server your data is less likely to be killed.

You need to be aware that RAID is not backup - if you are going to RAID a couple of SSD's thats fine, but get different brands of SSD's, or at least different models so they have different performance characteristics. WHEN SSD'S DIE THEY ARE WAY MORE LIKELY TO DO SO WITHOUT WARNING, AND NO ABILITY TO PULL DATA OFF - on the flip side they are 10x as reliable as regular hard disks.

Look into the Samsung 850 series disks - at least for 1/2 the array - they are/were prosumer and offer good bang for buck, and are touted as being more reliable then 2d nand. (They use 3d nand).

Also, as someone else mentioned, don't do RAID5. Drives hold to much for it to work reliably - and back up your data.

davidgo
  • 5,964
  • 2
  • 21
  • 38
  • Just to add, The 850 *pro* is the one to go for. the standard 850 uses TLC – Journeyman Geek Jul 21 '15 at 00:58
  • @JourneymanGeek - I think the 850 EVO and 850 Pro both use 3d nand - it is the 840 series that doesn't. This is backed up by Samsungs site - http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/ssd850evo/overview.html - I'm a lot less certain, but I think the 850 pro has supercaps and better engineering, but the memory is very similar if not identical. – davidgo Jul 21 '15 at 01:35
  • @davidgo That's why you buy Intel ;). Unless things have changed, the Intel SSDs will stop accepting writes when they fail and remain readable so that data can be copied off. http://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte – DetlevCM Jul 21 '15 at 09:06
  • 3
    @DetlevCM: Remain readable _until the first reboot_ at which they intentionally brick themselves. That in itself automatically disqualifies them for any serious use. – MSalters Jul 21 '15 at 09:43
  • @MSalters Its not supposed to.... (it is supposed to remain readable - just not writeable) – DetlevCM Jul 21 '15 at 14:44
  • @DevlevCm The report you linked to (which I was familiar with and is quite good) confirms this behaviour of the Intel drive dispapearing on reboot on the "More Casualties" page. The Samsung drive outperformed the Intel - and that was the older 840 series - the 850 series is supposed to have a much longer lifespan again ! I do like Intel drives as well though. – davidgo Jul 21 '15 at 19:05
  • @davidgo non pros are TLC (and the 850s use 3d), EVOs iirc use ram to cache a bit more agressively and may have some slc or mlc cache internally for better performance. Pro is MLC. – Journeyman Geek Jul 21 '15 at 22:58
7

Even consumer-grade SSDs are much faster than the faster 15k HDDs, so from a performance standpoint they will be fine (if using the right disk and if overprovisioning them), but you had to carefully pick them, especially due to how they interact with hardware-based RAID controller...

  1. First, check if affordable, entry-level enterprise grade drive (as Intel S3500/S3600, Micron M500DC and Micron M510DC) are within your reach. If so, you can skip the whole consumer-grade lottery.
  2. Check whether your RAID cards support 3rd party disks. For example, earlier DELL firmware for H700/H710/H710p cards refused to initialize non DELL-rebranded disks. A subsequent update initialized such disks, but marked the array "degraded". Only relatively recent (end 2013) firmware updates corrected that precarious situation.
  3. Keep your disk's private cache enabled. Some RAID card will forcibidy disable the disk's private cache. This kill performance for consumer-level SSD, as they make heavy use of private DRAM cache both to cache their indirection table and to mask the heavy latency involved into erasing/programming MLC NAND. For example, an otherwise very fast Crucial M550 240GB drive write at incredibly slow rate of 5 MB/S when its internal cache is disabled.
  4. If possible, strongly favor disks with FULL power-loss protection. This put squarely in the enterprise champ but, as stated above, there are relatively cheap disks in this champ.
  5. If no full power-loss protected SSD are in your shop list, at least use disk with partial power-loss for data-at-rest protection. Some excellent driver with such protection are the Crucial/Micron M500/M550 and the newer M600. Micron even has an interesting document on how/why overprovision its M600 drive for use in virtualization environment. Anyway, remember that with non-full power loss protected drives, it remain a small possibility to lose/corrupt your data. How small? it depends on your RAID controller behavior (for example, if it issues a final ATA flush command after transferring data to a cache-enabled disk) and on the disk's firmware, so it is not possible to give you a detailed answer. What I can say is that on all my tests, PERC RAID cards seems to always flush the disk's private cache (if it is enabled)
  6. Strongly over-provision your consumer drives, at least with a 25/30 % reserved capacity.
  7. Do not use second-class consumer drives. Even good consumer drivers have their problems, and going with a lower-tier consumer disk is asking for troubles.
shodanshok
  • 44,038
  • 6
  • 98
  • 162
6

If you are using them for writes, to avoid data corruption in the event of power failure you need to make sure that you only consider models with a supercap. Eg. Intel S3500, Samsung 845DC Pro

Otherwise consumer SSDs are more suited to caching.

JamesRyan
  • 8,138
  • 2
  • 24
  • 36
  • I upvoted this even though I disagree. Certainly supercaps etc are a good idea for an SSD - hence the upvote - but it implies that consumer SSD's are unreliable - I don't believe this is correct - in fact I assert they are 10 times as reliable as spinning hard drives. Also, hard drives don't have supercaps or equivalent - and indeed are more vulnerable to power outages. The thing is that modern filesystems have journals to mitigate the risk of loss (and there are certain speedups you should not use on a drive unless it has a supercap / battery backup) – davidgo Jul 21 '15 at 19:13
  • 3
    @davidgo because HDs don't lie about caching sync writes whereas a lot of consumer SSDs do – JamesRyan Jul 21 '15 at 21:16
  • This, this, 1000 time this. Comsumer SSDs will _lie to the RAID controller_, making data loss possible even in highly-redundant RAID designs. If you use SSDs in the enterprise, you want the onboard capacitor. – Joel Coel Jul 22 '15 at 01:10
5

The reason to go with enterprise grade gear is reliability more than speed. Most consumer SSDs are MLC, with the lower end stuff being TLC (MLC does 2 bits a cell, TLC does 3, and they're less performant, and reliable than SLC). At some point, they may also drop the onboard ram cache to save costs, as nand cells get cheaper. A enterprise SSD also has greater redundancy built in with more spare nand chips

TLC's newer, slower, theoratically less reliable, has a lower MBTF. You'd want to go for MLC drives

As for reliability, its a mixed bag. You have resistance to physical head crashes, sure, but controllers can die. Drive endurance has improved significantly.

Consider a few things - All drives die. If its important, it absolutely needs to be backed up. Consider this to be nearline storage, and factor in unreliability.

If you're looking at endurance, a modern, high end, consumer SSD (like the samsung 850 pro) have pretty decent endurance. The 850 pro's rated for 150-300 tb of writes (compared to 73 tb for the older model, and 7300 to 14600 tb for the newer models). You might be able to trade off space for nand endurance by playing with spare space. Enterprise SSDs come with more spare space so if a SSD cell or chip wears out it can adjust.

Many consumer drives won't let you read when write endurance failed. One big brand does it, but I can't remember which.

Edit : Recently, a 'linux kernel bug' with samsung SSDs was reported in general, enterprise grade hard drives are boring, reliable old tech. Consumer hard drives, I guess slightly less so. Some of the bugs are being shaken out - and there's changes going on, like nvme becoming more common. Be prepared to test your SSDs before committing anything critical to it. This seems to be a unique edge case but it could be you!

Journeyman Geek
  • 6,969
  • 3
  • 31
  • 49
5

The performance inconsistency of consumer SSDs can cause problems with some raid controllers, the spikes in I/O latency are exacerbated when using a raid controller as it often will not be using TRIM (I don't know of any controller that does). Enterprise drives are designed around consistent performance even without TRIM so they typically play well with RAID controllers.

If you do not need the high endurance there are lower end enterprise SSDs designed around high read, low write cycles. Intel S3500 or Samsung 845DC both offer cheap but raid controller compatible SSDs.

However if you are using dell/hp raid controllers you have to be careful around compatibility, in my experience HP is the worst when it comes to non-hp drives with their controllers and will sometimes not present any monitoring info about the drives.

user300497
  • 51
  • 2
0

Just a bit of info, adding to the confusion:

If You are planning to deploy S2D - now or some time in the future - You can NOT, I repeat NOT use consumer SSD´s!

MS decided (wise enough), that S2D should always ensure, that every bit is written correctly AND SECURELY, before the next is sent. So S2D will only use any on disk cache, if it is fully protected against power loss (read: full PLP). If so, the disk - regardless of type - is as fast as its cache, at least until this is exhausted.

BUT, if You are using consumer SSD´s (no PLP), S2D will per design write-through the cache and wait for the data to be confirmed written directly to the individual NAND circuit. Which pr. design results in write latency being measured in seconds as opposed to microseconds even at relatively low loads!

I have seen a lot of discussions on the subject, but never seen anyone actually finding a workaroud this. One could argue, that dual PSUs and UPS would provide sufficient protetction at least for non-critical workloads, especially if they are replicated. So in specific use cases, it would be relevant to be able to "cheat" S2D into using on disk cache that is not PLP. But that decision to overrule basic data integrity is NOT up for discussion - it is PLP or no S2D, period!

I learned this the hard way in a really overdimensioned 4 node cluster (256 cores, 1,5Tb RAM, 16x4Tb Samsung QVO 860, 20 relatively small Hyper-V´s), where performance started out acceptable. When replication was set up, performance went over poor to really bad. The VMs went from somewhat slow do completely nonresponsive. Eventually ending up in the whole pool crashing beyond repair. Studying the logs revealed a bunch of errors - all related to write latency, sometimes values were beyond 15 SECONDS...!

We suspected network errors or just bottlenecks (2x10Gbit without RDMA), but no matter what We did to tweak performance (even tried 4x10Gbit with RDMA), We ended up with the same result. So I studied more and stubled upon an article explaining why You should NOT use consumer SSDs with S2D. Being cheap (and having bought two sets of 16x4Tb consumer disks!) I studied some more, trying to bypass this per-design obstacle. I tried a lot of diffent solutions. With no luck...

So I ended up buing 16x1Tb real datacenter SSDs (Kingston DC500M, the cheapest PLP disks I could find) for testing. And sure enough, all problems dissapeared and HCI is suddently as fast, robust and versatile as claimed. Damn!

Now the same setup is running twice the load with the original network configuration, half as many cores and half as much RAM, but write latency rarely exceeds 200 microseceons. Furthermore, all VM are responsive as h..., users are reporting sublime experience and we have no more errors in backup or syncronisation or anywhere else, for that matter.

The only difference is that disks are now 16x4Tb Kingston DC500M.

So use this hard learned lesson as adwise: do NOT use disks without PLP in HCI...!

HSV
  • 1
-1

If it matters, RAID 1. I would rather have two cheap consumer SSD's in RAID 1 than the best enterprise SSD. The pair should wear at approximately the same rate, but other than wear, they are extremely unlikely to fail at the same time. You should have enough RAM to drastically limit paging so that you can put your system and programs on a hard drive and then put your database(s) on the SSD pair. Since hard drives are cheap, you can afford to RAID 1 those, too. Outside of a fire, that setup will protect your data and provide excellent performance. Then, you can backup to the cloud and call it a day.

  • 5
    Enterprise SSDs have an onboard capacitor to guard against sudden power loss. Consumer SSDs not only lack this, but will also _lie to RAID controllers_ about having correctly flushed volatile buffers, making them vulnerable to data loss even in highly-redundant RAID configurations. – Joel Coel Jul 22 '15 at 01:13