Blerg!
Darn, I tried to stuff this into a comment, but the formatting wasn’t sufficient, so I had to resort to putting it in an answer.
Statistics
The reason you have not seen it is because the odds of you seeing it are low, and more importantly, you were not looking. The odds of you noticing a memory error are calculated from the odds of:
- a cosmic ray hitting the Earth
- the ray hitting at your location
- the ray not getting obstructed or absorbed by anything else
- the ray hitting your computer
- the ray hitting your RAM
- the ray flipping a bit in the RAM
- the bit being in a block of currently allocated memory
- the used memory being either:
- tested by a program like memtest86+
- tested at just the right moment to detect the error (e.g., between the microsecond where the program writes the memory and then reads it back and compares)
- allocated to a block of executable code, in which case also:
- the changed bit significantly modifies the code enough to have a drastic effect on the code
- the drastic effect causes it to crash
- the crashing program actually crashes visibly instead of simply disappearing
- the program being something that you notice and care about
- you don’t simply discard it as a buggy program
Of course this is if we are talking about transient, intermittent errors like from cosmic rays and interference from other electronics. If the RAM module is actually defective, then you almost certainly will see problems at some point (though even then, it is conceivable that if you never use up all the physical RAM at any given time and the defect happens to be small and localized entirely in a part that never gets used by you, then you might not see an error).
The odds of a transient error can indeed be surprisingly high, but you probably have seen memory errors over the years and simply did not notice them because of two of the above list items: the executable code and the buggy-ignoring.
Examples
If the changed bit happens to fall in a piece of data, then you may not even notice it because it could easily get drowned out.
For example, if a bit got flipped in a block of text data, then you might notice that The end.
turned into Tje end.
, but instead of noticing that the h
had been replaced by a j
because a single bit had gotten flipped (feel free to confirm if you like), you would more likely just assume that your finger hit the wrong key because they happen to be right next to each other and just fix the error.
Worse, if the flipped bit happened to be part of a picture, audio, or video file, you may not notice anything at all. If it just happened to be in just the right place, then it might cause a noticeable change like the width or height of the picture being wrong, or a slight popping sound in the song or a bit of corruption in the video causing a momentary blockiness during decoding. However, given the sheer size of media files, the chances of a single bit being in just the right location are extremely low. It is much more likely that it will slightly change the color of a single pixel (e.g., dark red to slightly darker red) and you would probably never notice. It might change a single peak of the song’s waveform so that it has a slightly lower amplitude and you would likely never notice. It might change a single pixel in a single frame of the video and you probably could not notice.
Caveat
The terrifying fact is that this sort of undetected, transient error can indeed creep in and go unnoticed. That is why I have been really concerned about using flash media for backups, because sometimes they get corrupt, and if you don’t notice, then the corruption could sneak into your backup and end up permanent. Moreover, testing for corruption can be difficult because changes are expected, so you would have to manually examine every single change which for binary files would be a nightmare.
Take away
I suppose the bright side, if there is one, is that as I said in the list, the change has to happen to land in a part of data that is actually important. For most people, the odds of it landing in a piece of important, irreplaceable data that is to be saved tends to be really low.
You can use a program like memtest to check your RAM for defects. If it passes muster, then you only have to worry about the “one-in-a-billion chances” (I’ll leave the exact calculation to someone else if desired) of a bit of important data getting corrupted, otherwise a bit of “bit-rot” here or there will usually not do much other than perhaps crash a program and cause you to swear at the devs (though even then, if it doesn’t do it again…)
First if there was 95% failure rate on memory the industry would go out of business the source you quoted is simply wrong – Ramhound – 2013-11-07T22:29:09.140
2@Ramhound second…? – Synetech – 2013-11-07T22:29:42.730
@Synetech just saying if memory had 95% error rate it wouldn't be used. The paper is also 4 years old and old looks at DDR and DDR2 and thus because it was written in 2009 it's basically inaccurate because that's more or less 4 decades in technology time yes technology time is 10 faster the normal time. As to the final question it's simple the paper and blog post (shocker) isn't 100% accurate – Ramhound – 2013-11-07T22:33:35.087
How do you know you've seen no memory errors? You've never had a crash or freeze you weren't able to explain? – David Schwartz – 2013-11-07T22:37:22.717
@DavidSchwartz if your asking if I personal had a crash I couldn't explain that would be a negative I strive to understand every crash that happens I normally have a general idea of the reason – Ramhound – 2013-11-07T22:40:32.567
@Ramhound Then it sounds like there's a good chance you misdiagnosed some crashes or freezes that were due to memory errors. Or you just got absurdly lucky and never had a memory error hit a vital spot. – David Schwartz – 2013-11-07T22:42:07.750
1@Ramhound, you said “First…”, then nothing more. Was there a second point? – Synetech – 2013-11-07T22:53:44.300
Memory errors and memory failure are 2 different things. A flipped or missing bit here or there is generally OK in most applications that don't require ultra high fidelity / precision... As in most applications... – Austin T French – 2013-11-07T23:37:06.600