How do desktop, laptop and handheld personal computers deal with ocasional RAM data corruption?

1

It is not a secret that spontaneous stored data corruption (like a bit flip) is not a frequent but still pretty possible event in normal SDRAM (including DDR 1/2/3/4 SDRAM) operation.

Servers and heavy-duty workstations use ECC mechanism built in to their SDRAM modules to address the issue guaranteeing that a data word read is always the data word written to that particular address before.

But what about desktop/laptop PCs and hand-held gadgets using non-ECC RAM modules? How can a developer be sure that a variable contains exactly the value written to it?

Ivan

Posted 2012-02-18T21:07:27.187

Reputation: 6 173

1

Not a answer, but here is a study Google put out in conjuction with the University of Toronto you might be in showing the memory errors Google found in their entire fleet of machines over a span of 2.5 years.

– Scott Chamberlain – 2012-02-18T22:04:12.477

Answers

0

I don't think there are there any mechanisms to prevent errors in non-ECC systems, either at the system level or the OS level. I don't believe windows refreshes the RAM content on a regular basis, the hardware is expected to keep the values in memory.

  • A quick memory check is done in the BIOS when starting the system, so some major errors can be detected there.

  • Some files, such as installers, may perform CRC checks. But those errors may come from the storage media rather than memory.

  • If a random error in a memory location that will not be used, or will be overwritten. No problem. I suspect a very large percentage of random, non-repeated errors happen in that space.

  • Corruption in an application space may generate an exception rather than a crash, which may or may not be handled gracefully. However that would be mere luck - there are no exceptions for "memory errors" per se, but the error could occur in a place that happens to handled by an exception handler. Otherwise the application will either carry the corrupted data, or crash if it causes an invalid operation.

Other than that, applications/OS will crash and written files will be corrupted.

Applications, if they wish, can implement checking mechanisms by performing operations twice or more. Memory checkers and third-party file copying tools are such examples.

mtone

Posted 2012-02-18T21:07:27.187

Reputation: 11 230

"A quick memory check is done in the BIOS when starting the system, so some major errors can be detected there" - as an experienced PC hardware diagnostics engineer (testing memory modules since the age of 286 CPUs) I can assure you that POST RAM test done by BIOS is almost totally useless. – Ivan – 2012-02-19T00:29:41.363

0

Yep, generally the system will perform a RAM test on startup and attempt to quarantine any bad pages.

Otherwise, if a bad page is somehow discovered while operating (not too likely on most consumer boxes since they usually have no parity checking, but possible on "big iron") then, if the page is "virtual", and it has not been marked "changed", the system will probably "remove" the page, mark the physical page bad, and then page it back in again from disk.

If the page is "dirty" (changed), however, the error will be reported to the application as an exception of some sort. If the bad page is some sort of system page then the system will crash.

Of course, absent parity checking or some other mechanism to discover an error the bad data is simply used, and whatever happens happens. This is the most likely case on consumer hardware.

Daniel R Hicks

Posted 2012-02-18T21:07:27.187

Reputation: 5 783

0

They do not handle such situation at all. Bootup test addresses last bit of memory and that ia all that normal BIOS does. Linux/FreeBSD etc has fixup that they allow you to exclude memory addresses from being accessed by system. Unless you do so systems most likely will crash when kernel internal integrity check finds "bit flip" (Signal 11 in Unix)

ZaB

Posted 2012-02-18T21:07:27.187

Reputation: 2 365