54

There are plenty of resources available online that discuss using SSD drives in RAID configurations - however these mostly date back a few years, and the SSD ecosystem is very fast-moving - right as we're expecting Intel's "Optane" product release later this year which will change everything... again.

I'll preface my question by affirming there is a qualitative difference between consumer-grade SSDs (e.g. Intel 535) and datacenter-grade SSDs (e.g. Intel DC S3700).

My primary concern relates to TRIM support in RAID scenarios. To my understanding, despite it being over 6 years since SSDs were introduced in consumer-grade computers and 4 years since NVMe was commercially available - modern-day RAID controllers still do not support issuing TRIM commands to attached SSDs - with the exception of Intel's RAID controllers in RAID-0 mode.

I'm surprised that TRIM support is not present in RAID-1 mode, given the way drives mirror each other, it seems straightforward. But I digress.

I note that if you want fault-tolerance with disks (both HDD and SSD), you would use them in a RAID configuration - but as the SSDs would be without TRIM it means they would suffer Write-Amplification which results in extra wear, which in turn would cause SSDs to fail prematurely - this is an unfortunate irony: a system designed to protect against drive failure might end-up directly resulting in it.

So:

  • Is TRIM support necessary for modern (2015-2016 era) SSDs?
    • Is there any difference in the need for TRIM support between SATA, SATA-Express, and NVMe-based SSDs?
  • Often drives are advertised as having improved built-in garbage-collection; does that obviate the need for TRIM? How does their GC process work in RAID environments?
  • A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan - however it seems all SSDs today (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays - is this distinction of relevance anymore?
    • And what about TLC flash?
  • Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan) - if their write-cycle limit is very high (e.g. 100 complete writes per day) does this mean that they don't need TRIM at all because those limits are so high, or - the opposite - are those limits only attainable by using TRIM?
Dai
  • 2,251
  • 8
  • 27
  • 42
  • 6
    While I can't answer your question I think it should be considered in light of the fact that our industry is trying it's best to kill off proprietary RAID, the public cloud providers all use SSD's now on compute and storage services , surely they solved this with software, erasure coding etc. Cloud computing innovations have exposed things like hardware RAID, the Cisco IOS and proprietary storage area networks as pointless commodities fermenting at the top of the food chain and actually hampering innovation. hardware RAID can't be sold at scale (to AWS, Azure, CERN) so.... – Sum1sAdmin May 13 '16 at 10:14
  • @Sum1sAdmin *the public cloud providers all use SSD's now on compute and storage services* Are you saying that AWS or BlackBlaze store data only on SSDs? – A.L May 13 '16 at 13:39
  • @A.L well no, I'm only pointing out that the offer SSD for block, file, object and ephemeral storage – Sum1sAdmin May 13 '16 at 14:44
  • I completely disagree with the "qualitative difference" between commercial and consumer. I can assure you there is no difference in manufacturing of NAND. There are certainly different manufacturers (Samsung vs Intel) but neither has a special consumer manufacturing process. There are certainly feature differences, but not quality differences. – Jim B May 13 '16 at 16:47
  • Speaking of Optane, we are almost half way through 2016 and it's nowhere in sight... Anyone know what the deal is? – Jeff Meden May 13 '16 at 16:55
  • One question per question, please. This is _way_ too broad. – Lightness Races in Orbit May 13 '16 at 22:59
  • @JeffMeden This month it was revealed Intel has Optane-based SSDs using the NVMe interface (using current U.2 and PCI-Express interfaces) that are drop-in replacements for existing SSD storage, they also have DDR-DIMM interfaces for Optane non-volatile storage, but it requires special OS support. – Dai May 14 '16 at 00:51
  • 1
    Trim is still very useful, despite what people have said here. Without trim, we cannot perform cleanup on blocks intelligently. Thus, when we need to re-write an area that still has data on it after a previous deletion, we must perform a clear cycle (slow) and then a write cycle (not as slow). The use of software RAID / LVM, modern filesystems, and making sure TRIM is enabled as an unmap command in these systems is still good for performance. The drive cannot rightly know what areas to clear unless we have a filesystem inform the drive explicitly which areas to do this to after file deletion. – Spooler Sep 28 '16 at 01:11
  • Supporting TRIM on parity-using RAID levels are likely tricky - it will break the parity, since you can no longer expect to read back the same data after an area have been trimmed... (If it returns predictable values when read, like all zeroes, it would be possible to recalculate the parity, but this add extra overhead, which might negate the gains of TRIM) – Gert van den Berg Mar 11 '17 at 10:53

4 Answers4

26

Let's try to reply one question at a time:

  • Is TRIM support necessary for modern (2015-2016 era) SSDs?

Short answer: in most cases, no. Long answer: if you reserve sufficient spare space (~20%), even consumer-grade drive usually have quite good performance consistency values (but you need to avoid the drives which, instead, choke on sustained writes). Enterprise-grade drives are even better, both because they have higher spare space by default and because their controller/firmware combo is optimized toward continuous use of the drive. For example, take a look at the S3700 drive you referenced: even without trimming, it has very good write consistency.

  • Often drives are advertised as having improved built-in garbage-collection, does that obviate the need for TRIM? How does their GC process work in RAID environments

The drive garbage collector does its magic inside the drive sandbox - it does not know anything about the outside environment. This means that it is (mostly) unaffected by the RAID level of the array. That said, some RAID levels (the parity-based one, basically) can sometimes (and in some specific implementation) increase the write amplification factor, so this in turn means higher work for the GC routines.

  • A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan, however it seems all SSDs (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays - is this distinction of relevance anymore

SLC drives have basically disappeared from the enterprise, being relegated mainly to military and some industrial tasks. The enterprise marked is now divided in three grades:

  • HMLC/MLCe flash is the one with the better binned MLC chips, and certified to sustain at least 25000/30000 rewrite cycles;
  • 3D MLC chips are rated at about 5000-10000 rewrite cycles;
  • normal planar MLC and 3D TLC chips are rated at about 3000 rewrite cycles.

In reality, any of the above flash types should provide you with plenty of total write capacity and, in fact, you can find enterprise drives with all of the above flash types.

The real differentiation between enterprise and consumer drives are:

  • the controller/firmware combo, with enterprise drives much harder to die due to unexpected controller bug;
  • the power-protected write cache, extremely important to prevent corruptions to the Flash Translation Layer (FTL), which is stored on the flash itself.

Enterprise grade drivers are better mostly due to their controllers and power capacitors, rather than due to better flash.

  • Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan), does this obviate any concerns over Write-Amplification caused by not running TRIM?

As stated above, enterprise grade drives have much higher default spare space (~20%) which, in turn, drastically lowers the need for regular TRIMs

Anyway, as a side note, please consider some software RAIDs that support TRIMs (someone said Linux MDRAID?)

shodanshok
  • 44,038
  • 6
  • 98
  • 162
  • Only 35000 write cycles?! That doen't sound like very many. I guess it makes sense if the increased drive replacements are cheaper than buying the same capacity in SLC. – user253751 May 13 '16 at 21:52
  • 1
    ~30000 minimum guaranteed rewrite cycles are nothing bad: with the increased capacity brought by the switch to 2-bit-per-cell NAND, MLC drives are much cheaper than SLC ones while maintaining similar endurance rating. Moreover, the days when 50-nm class SLC cells were rated at >100000 rewrite cycles are probably gone: even enterprise drives have 34/25nm (or smaller) class flash, with intrinsically lower endurance (which affect SLC drives also). – shodanshok May 13 '16 at 22:21
  • @shodanshok Your conclusion is "Use enterprise-grade (SAS) SSDs attached to a normal RAID controller and don't worry about it" - and that I won't see a performance hit in the 5 years a drive is warranted for? And that the performance problems documented by other users only affect consumer-grade drives? – Dai May 14 '16 at 00:55
  • @Dai for "enterprise-grade" drives I do not mean only SAS SSDs, rather also selected SATA SSDs can be considered "enterprise-grade". And yes, enterprise drives have very good performance consistency, even in steady state (ie: completely full). For an example of such drives, see [here](http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/3). Even some consumer drives, when coupled with generous overprovision, can be quite consistent. See [here](http://www.anandtech.com/show/9451/the-2tb-samsung-850-pro-evo-ssd-review/2) for an example. – shodanshok May 14 '16 at 10:03
  • can you explain what "better binned" means? taking the best-tested chips? – tedder42 May 19 '16 at 04:43
  • 1
    Binning is the process through silicon chips are examined and grouped based on their quality. So, enterprise MLC NAND chips are basically the "better made", better tested chips. – shodanshok May 19 '16 at 06:38
  • @shodanshok You are completely misunderstanding the function TRIM provides. Garbage collection does **not** know when sectors are *deleted*. This is the problem. Garbage collection consolidates known empty sectors in a writeable block. But if a block is deleted, GC still thinks it isn't empty so it can't reorganise that block. Now if a block contains some sectors it thinks has writes (but they were actually deleted) and some empty and it writes to that block, it must read the block first, erase it then writeback the original deleted data plus your new data in the sectors it knows are empty. – Shiv Apr 19 '18 at 05:07
  • TRIM provides the function to the firmware to tell the controller those sectors were erased so you do not need the read/erase/write cycle. Instead you can just write (as TRIM allows the drive to do the erase async after the delete occurred). GC *cannot* handle this scenario. – Shiv Apr 19 '18 at 05:09
  • @Shiv: yes, it can - albeit partially. What GC does is to scan the NAND mapping tables to find LBA which were overwritten and relocated to other physical blocks, then it clears (ie: trims) the old blocks. It is the reason while sequential writes restore a healthy amount of SSD performance, even without issuing any trim from the OS. It is an indirect approach which works quite well for whichever filesystem do you use. However not all disks have (had?) an effective GC implementation. – shodanshok Apr 19 '18 at 08:09
  • @Shiv: from the very article you [posted below:](https://arstechnica.com/gadgets/2015/04/ask-ars-my-ssd-does-garbage-collection-so-i-dont-need-trim-right/) "TRIM isn’t magical, and you don’t have to have it. Modern SSDs with garbage collection will work fine without it" – shodanshok Apr 19 '18 at 08:14
  • @shodanshok Not having TRIM doesn't *break* a drive. But it will certainly degrade in performance and cause write amplification that GC won't correct. In short, if you *can* enable TRIM, it is more optimal than *not* enabling it as it will certainly have benefit that no other measure can cover. – Shiv Apr 19 '18 at 22:01
  • @Shiv TRIM is "nothing more" than an explicit hint for garbage collector to recycle the freed blocks. SSDs often process TRIM immediately, but it is not required by SATA specification (ie: a controller can recycle the trimmed block during it's garbage collection). Anyway, nobody claimed that TRIM is useless. However, with modern drivers and a sufficient overprovision, a good SSD will run near its maximum speed, thanks to GC. – shodanshok Apr 20 '18 at 21:22
  • @shodanshok without the explicit hint, the GC doesn't know the blocks are freed and will never process as empty. *That's* the point. Over-provisioning will probably be effective for home users as they won't be rewriting large portions of data on their drive every day. For anyone with large write quantities, this is a very different scenario and where TRIM will be a lot more effective/noticeable. – Shiv Apr 21 '18 at 15:55
  • @Shiv as I already explained, when new data are written the controller remaps the affected LBAs onto new physical blocks, *freeing* the old ones. So, a TRIM-less controller frees old block during data rewrite. Old controllers, with few blocks to spare, were slow in that process. TRIM is useful to free blocks *in advance*, basically treating free space as dynamic overprovisioning. Did you see the Anandtech articles were sequential writes significantly restored performance *without* TRIM? So, with a modern controller and sufficient spare area, TRIM is not strictly needed for good performance. – shodanshok Apr 21 '18 at 16:44
  • @Shiv I strongly suggest you to read [this IBM paper](https://www.google.it/url?sa=t&source=web&rct=j&url=http://www-01.ibm.com/support/docview.wss%3Fuid%3Dtss1wp102489%26aid%3D1&ved=2ahUKEwjNmbus6MvaAhWEzaQKHXEgDqIQFjAAegQIBxAB&usg=AOvVaw30grIqEsagJtkxgXF2yg6K) regarding GC. – shodanshok Apr 21 '18 at 16:48
  • @shodanshok "sufficient" spare area is the key phrase there. If you have heavy write load, there isn't such thing. Maybe for a home user who games and uses office, it is't a realistic problem but for someone with heavy read/write usage, it is a far bigger consideration and risk. Again, the overprovisioning just delays the problem, the magnitude of the delay dependent on load. – Shiv Apr 24 '18 at 18:56
9

TRIM isn't something I ever worry about when using SSDs on modern RAID controllers. The SSDs have improved, hardware RAID controller features have been optimized for these workloads, and endurance reporting is usually in place.

TRIM is for lower end SATA drives. For SAS SSDs, we have SCSI unmap, and perhaps that's the reason I don't encounter TRIM needs...

But the other commenter is correct. Software-Defined Storage (SDS) is changing how we use SSDs. In SDS solutions, RAID controllers are irrelevant. And things like TRIM tend to be less important because SSDs are filling specified roles. I think of Nimble storage read cache or the ZFS L2ARC and ZIL... They all meet specific needs and the software is leveraging the resources more intelligently.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • 3
    UNMAP and TRIM do exactly the same thing. – Michael Hampton May 13 '16 at 16:38
  • 2
    Trim/unmap is always required- without it, you'd have to completely rely on internal garbage collection – Jim B May 13 '16 at 16:50
  • 3
    Internal garbage collection is no substitute for TRIM. There is no firmware function that can replace what TRIM does. It is a little alarming so many answers here don't understand what TRIM actually does and why it is needed. Refer to articles such as this https://arstechnica.com/gadgets/2015/04/ask-ars-my-ssd-does-garbage-collection-so-i-dont-need-trim-right/ – Shiv Apr 19 '18 at 05:04
1

RAID levels with SSD An answer above suggests that RAID levels with parity, like RAID 5, increase write amplification. There is really more than one way to interpret that: the impact on one drive or the impact on the set of drives.

Compared to no redundancy, RAID 5 does add writes to the set as it adds checksum parity. Compared to a RAID 0 array of (n-1) drives, the per drive impact of RAID 5 array with n drives is nothing. Each of the n drives receives just as many writes. RAID 5 adds 1/(n-1) extra writes to the set. RAID 1 and RAID 10 however, add 100% extra writes to the set, because everything written to one SSD is written to its mirror.

So, in terms of write to a RAID 5 set vs a RAID 10 set with the same number of drives, the SSDs in the RAID 5 set will receive fewer writes. And that stays true even if you increase the number of SSDs in the RAID 10 set to equalize usable capacity.

Keith J
  • 11
  • 1
0

shodanshok touched on the real answer here. If you reserve extra space, "over-provision", your SSD's endurance and write performance consistency will both be improved over time, and the lack of TRIM support becomes mostly irrelevant. Reserving that extra space can be done as simply as, starting with a new SSD, partitioning less than the full capacity. Most of the in-drive controllers treat never used space the same as reserved space and thereby significantly reduce write amplification. For boot and OS, 10% reserved space is probably enough. For drives that are re-written often, increase that space.

Keith J
  • 11
  • 1