Why do SSD sectors have limited write endurance?

57

17

I often see people mention that SSD sectors have a limited number of writes before they go bad, especially when compared to classic (rotating disc) hard drives where most of those fail due to mechanical failure, not sectors going bad. I am curious as to why that is.

I am looking for a technical yet consumer-oriented explanation, i.e. the exact component that fails and why frequent writes affect the quality of that component, but explained in such a way that it does not require an extreme amount of knowledge about SSDs.

Nzall

Posted 2016-08-01T09:36:54.010

Reputation: 2 585

1

I believe this would be an intresting read for you: http://techreport.com/review/24841/introducing-the-ssd-endurance-experiment

– MustSeeMelons – 2016-08-01T12:07:07.057

4This rests on the precept that there are things you can use forever and never wear down – random – 2016-08-02T13:00:53.593

1Don't forget the current economy. While physical degradation is a fact. It is most certainly a fact very often defined at the blueprint stage with major factors such as cost and planned obsolescence. – helena4 – 2016-08-03T09:02:14.713

helena4, there is considerable competition to make longer-life flash devices, and it's a difficult enough problem, that I very much doubt that planned obsolescence is any significant factor. However, I'm reasonably sure you're correct that cost is a major factor. There are well-known methods for increasing the write endurance of flash memory, within limits, but it reduces the density so increases the cost per bit. – Eric Smith – 2016-08-04T03:08:17.173

Current flash devices are pretty good (compared to what they used to be like), I haven't seen one fail on a sector-by-sector basis for quite some time, possibly thanks to load-leveling etc.. so the technology is improving. Most flash I've seen die recently has been, presumably, the charge pump https://en.wikipedia.org/wiki/Charge_pump which makes the device stop completely and suddenly! The MTBF of most flash drives are not much worse than rotary HDDs... might be not worth worrying about too much, take sensible precautions and keep a backup routine.

– Michael Stimson – 2016-08-04T05:21:11.067

@random "This rests on the precept that there are things you can use forever and never wear down " Doesn't it just rests on the precept that SSDs fail after a much smaller number of writes than the technology that they're replacing? – David Richerby – 2016-08-04T08:20:18.380

The question is "why is there a limited amount of writes?" but doesn't say why there should be unlimited number of writes like some other media. Of which there are none. They are all limited @dav – random – 2016-08-04T12:24:05.387

1@random Only if you insist on interpreting the question in a vacuum and reading "limited" as meaning "having some finite bound" rather than "quite small." For example, if somebody says they "have limited time to contribute to Stack Exchange", I'm pretty sure you'd understand that they mean that they have a small amount of time compared to other users of the site and you wouldn't respond, "I know that: your available time is limited by the fact that you won't live forever." – David Richerby – 2016-08-04T12:37:06.560

@random I clarified what I meant. I did not say that hard drives are usable forever, but especially compared to SSDs, they tend to last a lot longer, disregarding mechanical failure. – Nzall – 2016-08-04T13:11:11.530

@helena4: I'm of the opinion that this isn't planned obsolescence more than it is cost. Drives using TLC NAND are becoming common because they're cheap; minimizing cost per GB trumps endurance because flash is still very expensive compared to mechanical HDD storage and most consumers don't approach the endurance limits for even 15nm planar TLC NAND (even at less than 1000 P/E cycles per block). Remember that endurance is strongly tied to long-term reliability and data loss events aren't good for the manufacturer's bottom line. – bwDraco – 2016-08-06T03:09:15.713

@EricSmith, and bwDraco: You are mistaken on principle and I'll tell you why. The problem with storage life has always been predictability. If the life is limited and you get an OK warning that you need to move before the ship sinks, people will not complain. (and not surprisingly thats what we see being the direction we are moving in). Storage devices are not exempt from common business practices as much as we might wish. – helena4 – 2016-08-08T09:25:37.700

Answers

82

Copied from "Why Flash Wears Out and How to Make it Last Longer ":

NAND flash stores the information by controlling the amount of electrons in a region called a “floating gate”. These electrons change the conductive properties of the memory cell (the gate voltage needed to turn the cell on and off), which in turn is used to store one or more bits of data in the cell. This is why the ability of the floating gate to hold a charge is critical to the cell’s ability to reliably store data.

Write and Erase Processes Cause Wear

When written to and erased during the normal course of use, the oxide layer separating the floating gate from the substrate degrades, reducing its ability to hold a charge for an extended period of time. Each solid-state storage device can sustain a finite amount of degradation before it becomes unreliable, meaning it may still function but not consistently. The number of writes and erasures (P/E cycles) a NAND device can sustain while still maintaining a consistent, predictable output, defines its endurance.

Kinnectus

Posted 2016-08-01T09:36:54.010

Reputation: 9 411

8

The limitation of flash write cycles is ot specific to NAND-type but is true for flash memory in general. E.g. https://en.wikipedia.org/wiki/Flash_memory#Write_endurance

– JDługosz – 2016-08-01T16:16:42.010

1@JDługosz: Flash memory in general has limited write cycles, but the actual mechanism causing the limitation varies with technology. – Ben Voigt – 2016-08-01T22:27:53.107

4The link I posted describes the NOR as being “floating gate” as well. It seems that the actual flash cell is the same, and NAND just refers to the way they are connected in series (thus resembling a NAND gate). The addressing logic and multiplexing details are irrelevant to the wear mechanics of the flash proper. – JDługosz – 2016-08-02T01:15:02.993

2Indeed -- all flash stores information as charge in a floating gate, that is basically the definition of what flash is; there are other kinds of Electronically Erasable Programmable Read Only Memory than flash, and they have different methods of degradation, but flash is defined as an EEPROM that stores information in a floating gate charge. NAND vs NOR defines the mechanism for how the data is read or written, not how it is stored. – Jules – 2016-08-02T07:44:13.480

10

At simplest, the physics is that you are forcing electrons through a (very thin) insulator by applying a high voltage. Occasionally this will cause bonds between atoms to break and re-form in different arrangements, which will degrade the insulation. Eventually the memory cell becomes leaky or shorts out and it can then no longer reliably store data. The wiki is interesting: https://en.wikipedia.org/wiki/Flash_memory#Memory_wear. It is possible to do an erase-and-repair cycle on a relatively large chunk of the chip by heating (annealing) it.

– nigel222 – 2016-08-02T16:54:10.553

64

Imagine a piece of regular paper and pencil. Now feel free to write and erase as many times as you please in one spot on the paper. How long does it take before you make it through the paper?

SSDs and USB flash drives have this basic concept but at the electron level.

MonkeyZeus

Posted 2016-08-01T09:36:54.010

Reputation: 7 101

35I like the analogy, but this answer could use some facts to explain what is actually happening. – GolezTrol – 2016-08-01T21:07:56.643

11It doesn't help that the same analogy is used for DRAM, which has many orders of magnitude higher limit on write cycles. – Ben Voigt – 2016-08-01T22:31:13.607

28

@BenVoigt Ok: DRAM is pencil + rubber eraser, flash is ink + ink eraser. The ink is more permanent, at the cost of the removal causing more damage. (Hey, that actually works pretty well for an analogy...)

– Bob – 2016-08-02T04:38:05.303

1Perhaps another similar analogy would be a classic chalkboard. If you write and wipe text on the same small spot, it will eventually becomes an indecipherable mess despite technically being still there (unlike a paper that breaks down). – Juha Untinen – 2016-08-02T10:22:12.350

The chalkboard analogy is pretty good. You erase by wiping the chalk off the black paint, but every time you wipe you slightly damage the paint you are wiping. Eventually the paint wears through, and that area of the board becomes first unclear and then completely useless as a place to write. – nigel222 – 2016-08-02T16:41:55.783

8OK, great. I'm imagining a piece of paper and a pencil. But a flash memory is nothing like a piece of paper and a pencil, so how does that help? You might as well say, "Imagine your car. If you drive it enough, the engine will stop working." Simply giving another example of something that breaks after being used many times doesn't explain why this particular system breaks after being used many times. – David Richerby – 2016-08-03T00:30:28.237

@DavidRicherby you're putting "material" onto a "medium", and then want to remove it, but it's not possible to seamlessly remove it without disturbing the medium. repeat too many times, and you will have damaged the medium to the point that it is no longer functional. – Dave Cousineau – 2016-08-03T06:56:16.773

5@Sahuagin But why is it like that? Why isn't it like a water bottle which I can fill and empty as many times as I want without any measurable erosion of the bottle? That's the problem with this analogy: it asks me to believe that a memory is like some other system but the only link between the two systems is the claim that the analogy works. – David Richerby – 2016-08-03T10:25:10.377

@JuhaUntinen A classic chalkboard is made of slate; a good quality modern chalkboard is made of enamelled steel. Either one is so much harder than either the chalk or the eraser that there will be essentially no damage to the chalkboard after any number of writes and erasures. So that's not going to work as an analogy. – David Richerby – 2016-08-03T10:29:20.647

@DavidRicherby: Actually my meaning was exactly that. The SDD/chalkboard is still physically there, but if you write and erase enough times, you can no longer see what the old data was and any new data will be impossible to use. In both the SDD and the chalkboard, you simply write to a different location to bypass this. – Juha Untinen – 2016-08-03T10:36:15.507

Why isn't it like a water bottle which I can fill and empty as many times as I want without any measurable erosion of the bottle? The difference is in scale. Water will do negligible damage to the bottle over a few thousand refills (well, to be honest, I never tried that :)), but the SDD will suffer much greater damage from each rewrite cycle. For the (plastic) water bottle scenario, you could think of acid instead of water. Electricity is acid to electronic components. The voltage or current (depending) is the pH ratio, if you will. – Juha Untinen – 2016-08-03T10:41:08.700

2@Juha But chalkboards don't work like that. The whole point of a chalkboard, as distinct from a wall, is that you can write and erase and write again and the new writing will be legible. And I appreciate the difference in scale between a memory cell and a water bottle. My question there was somewhat rhetorical, to make the point that you can't just claim that something is an analogy without explaining why it's an analogy and how the analogy helps to understand the real situation. The answer here does neither of those things. – David Richerby – 2016-08-03T11:11:16.377

This comment section is FUBAR. Seriously, a question about SSD's and you're talking about chalkboards. – g3mini – 2016-08-03T11:39:25.063

@DavidRicherby "Chalkboards don't work like that" tell that to the chalk... – corsiKa – 2016-08-04T22:54:27.370

@Bob - A chalkboard/whiteboard would be a better analogy for DRAM than pencil/paper. DRAM can withstand erasure/rewrite an effectively unlimited number of times, so it's less a question of slow-wearing vs. fast wearing and more a question of non-wearing vs. wearing, as the underlying hardware architecture is completely different between DRAM and Flash NAND. – aroth – 2016-08-08T05:27:10.593

25

The problem is that the NAND flash substrate used suffers degradation on each erase. The erase process involves hitting the flash cell with a relatively large charge of electrical energy, this causes the semiconductor layer on the chip itself to degrade slightly.

This damage on the long run, increase bit-error rates that can be corrected with software, but eventually the error correction code routines in the flash controller can't keep up with these errors and the flash cell becomes unreliable.

jcbermu

Posted 2016-08-01T09:36:54.010

Reputation: 15 868

1

The limitation of flash write cycles is ot specific to NAND-type but is true for flash memory in general. E.g. https://en.wikipedia.org/wiki/Flash_memory#Write_endurance

– JDługosz – 2016-08-01T16:16:50.103

@JDługosz - while this is true, the fact that NOR flash can be erased & rewritten on a per-word rather than per-block basis means that the degradation will be slower in many cases, so is qualitively different, even if the mechanism is the same. – Jules – 2016-08-02T07:46:50.213

It's an important point that it's erase cycles that cause wear, and not write cycles. It's possible to take advantage of this to write several times to a region before erasing if you know your changes are cumulative (e.g. a bitmap of 'in-use' sectors can accumulate many writes before it needs to be reset). – Toby Speight – 2016-08-02T10:07:33.493

Example: the Empeg (later Rio) car MP3 player stores settings in a fixed-length slot; many of these fit in an erase block. When reading, it just picks up the latest one that has a valid checksum. The block only needs to be erased when every slot within the erase-block has been used, rather than every time the settings are written. – Toby Speight – 2016-08-02T10:09:57.697

11

My answer is taken from people with more knowledge than me!

SSDs use what is called flash memory. A physical process occurs when data is written to a cell (electrons move in and out.) When this happens it erodes the physical structure. This process is pretty much like water erosion; eventually it's too much and the wall gives way. When this happens the cell is rendered useless.

Another way is that these electrons can get "stuck," making it harder for the cell to be read correctly. The analogy for this is a lot of people talking at the same time, and it's hard to hear anyone. You may pick out one voice, but it may be the wrong one!

SSDs try to spread the load evenly between its in use cells so that they wear down evenly. Eventually a cell will die and be marked as unavailable. SSDs have an area of "overprovisioned cells," i.e. spare cells (think substitutes in sport). When a cell dies, one of these are used instead. Eventually all these extra cells are used as well and the SSD will slowly become unreadable.

Hopefully that was a consumer friendly answer!

Edit: Source Here

Lister

Posted 2016-08-01T09:36:54.010

Reputation: 1 185

10

Nearly all consumer SSDs use a memory technology called NAND flash memory. The write endurance limit is due to the way flash memory works.

Put simply, flash memory operates by storing electrons inside an insulating barrier. Reading a flash memory cell involves checking its charge level, so to retain stored data, the electron charge must remain stable over time. To increase storage density and reduce cost, most SSDs use flash memory that distinguishes between not just two possible charge levels (one bit per cell, SLC), but four (two bits per cell, MLC), eight (three bits per cell, TLC), or even 16 (four bits per cell, TLC).

Writing to flash memory requires driving an elevated voltage to move electrons through the insulator, a process which gradually wears it down. As the insulation wears down, the cell is less able to keep its electron charge stable, eventually causing the cell to fail to retain data. With TLC and particularly QLC NAND, the cells are particularly sensitive to this charge drifting due to the need to distinguish among more levels to store multiple bits of data.

To further increase storage density and reduce cost, the process used to manufacture flash memory has been scaled down dramatically, to as small as 15nm today—and smaller cells wear down faster. For planar NAND flash (not 3D NAND), this means that while SLC NAND can last tens or even hundreds of thousands of write cycles, MLC NAND is typically good for only about 3,000 cycles and TLC a mere 750 to 1,500 cycles.

3D NAND, which stacks NAND cells one on top of another, can achieve higher storage density without having to shrink the cells as small, which enables higher write endurance. While Samsung has gone back to a 40nm process for its 3D NAND, other flash memory manufacturers such as Micron have decided to use small processes anyway (though not quite as small as planar NAND) to deliver maximum storage density and minimum cost. Typical endurance ratings for 3D TLC NAND are about 2,000 to 3,000 cycles, but can be higher in enterprise-class devices. 3D QLC NAND is typically rated for about 1,000 cycles.

An emerging memory technology called 3D XPoint, developed by Intel and Micron, uses a completely different approach to storing data which is not subject to the endurance limitations of flash memory. 3D XPoint is also vastly faster than flash memory, fast enough to potentially replace DRAM as system memory. Intel will sell devices using 3D XPoint technology under the Optane brand, while Micron will market 3D XPoint devices under the QuantX brand. Consumer SSDs with this technology may hit the market as soon as 2017, although it is my belief that for cost reasons, 3D NAND (primarily of the TLC variety) will be the dominant form of mass storage for the next several years.

bwDraco

Posted 2016-08-01T09:36:54.010

Reputation: 41 701

5

A flash cell stores static electricity. It's exactly the same kind of charge that you can store on an inflated balloon: you place a few extra electrons on it.

What's special about static electricity is that it stays in place. Normally in electronics, everything is connected to everything else in some way with conductors, and even if there's a large resistor between a balloon and ground then the charge will vanish pretty quickly. The reason that a balloon stays charged is that air is actually an insulator: it has infinite resistivity.

Normally, that is. Since all matter consists of electrons and atom rumps, you can make anything a conductor: just apply enough energy, and some of the electrons will shake loose and be (for a short while) free to move closer to the balloon, or further from it. This actually happens in air with static electricity: we know this process as lightning!

I don't have to emphasise that lightning is a rather violent process. These electrons are a crucial part of the chemical structure of matter. In the case of air, lightning leaves a bit of the oxygen and nitrogen transformed to ozone and nitrogen dioxide. Only because the air keeps moving and mingling and those substances eventually react back to oxygen and nitrogen is the no “persistent harm” done, and the air is still an insulator.

Not so in case of a flash cell: here, the insulator must be way more compact. This is only feasible with solid-state oxide layers. Sturdy stuff, but it too isn't impervious to the effects of forcing some charge through the conductive material. And that's what eventually wrecks a flash cell, if you change its state too often.

By contrast, a DRAM cell doesn't have proper insulators in it. That's why it needs to be periodically refreshed, many times a second, to not lose information; however, because it's all just ordinary conductive charge transports, nothing much bad usually happens if you change the state of a RAM cell. Therefore, RAM endures many more read/write cycles than flash does.


Or, for a positive charge, you remove some electrons from the molecule bonds. You need to take so few that this doesn't affect the chemical structure in a detectable way.

These static charges are actually tiny. Even the smallest watch battery that lasts for years supplies enough charge every second to charge hundreds of balloons! It just doesn't have nearly enough voltage to punch through any noteworthy potential barrier.

At least, all matter on earth... let's not complicate things by going to neutron stars.

leftaroundabout

Posted 2016-08-01T09:36:54.010

Reputation: 342

1

Less technical, and an answer to what I believe OP means by "I often see people mention that SSDs have a limited amount of writes in their sectors before they go bad, especially compared to classic rotating disk hard drives, where most drives fail due to mechanical failure, not sectors going bad."
I'll interpret the OP question as, "Since SSDs fail far more often than spinning rust, how can using one give a reasonable reliability?"

There are two types of reliability and failure. One is the thing fails completely due to age, quality, abuse, etc. Or, it may have a sector error due to lots of read/write.

Sector errors happen on all media. The drive controller (SSD or spinning) will re-map a failing sector data to a new sector. If it has failed completely, then it may still remap, but the data is lost. In SSD the sector is large and often fails completely.

SSDs can have one or both types of reliability. Read/write cycle issues can be helped with
having a larger drive. If you have a small drive and use it for OS like Windows, then it will get a lot of read/write cycles. The same OS on a much, much larger capacity drive will have fewer cycles. So, even a drive with "only" a few thousand cycles might not be a problem if each sector isn't erased frequently.
Balancing data - SSDs will move data from frequently used sectors to less frequently used ones. Think about the OS again, and updates, vs. a photo you took and just want to keep. At some point the SSD might swap the physical locations of the photo and an OS file to balance out the cycles.
Compression - compressing data takes less space, thus less writing.

Then there is quality of components. Getting the cheapest SSD or USB you can find might work for a while, but a quality one made for enterprise use will last a lot longer time, not just in erase cycles but in total use.

As drives get larger and larger (like 100-1000GB) then erase cycles become less of an issue even though they can sustain less writes. Some drives will use DRAM as a cache to help lower write cycles. Some will use a high-quality segment of the SSD for cache and lower quality for low cost and large size.

Modern good-quality consumer SSDs can last a good long time in a consumer machine. I have some 5+ years old that still work. I also have a couple of cheap, new ones that failed after a few months. Sometimes it is just (bad) luck.

MikeP

Posted 2016-08-01T09:36:54.010

Reputation: 121

A couple of minor points to consider clarifying: 1) Sector size in 3rd paragraph: in either media, it can be a very small area of actual failure. The drive works in fixed-size units so no matter how small the failure is, it still locks and maps based on the smallest unit it deals with. 2) Number of cycles vs. drive size in 4th paragraph: The number of cycles is the same regardless of drive size. You're talking about the potential need to reuse blocks more if the amount of data is large relative to the size of the drive. (cont'd) – fixer1234 – 2016-08-04T21:31:41.370

In general, your answer focuses more on how the limited writes are dealt with and how significant the issue is than the actual question of what causes the limited number of writes. – fixer1234 – 2016-08-04T21:32:11.150