What is more likely? storage bit rot (e.g.URE) or ram memory bit rot?

0

The background to the question is that on most computersystems data can corrupted at several places. The questions asks about the relative problems connceted with two places of such data-rot/data-corruption

  1. in main memory - especially for laptops (no ECC-memory availble)
  2. on the storage medium - e.g magnetic hard disk or SSDs, and might be even silent (that is not a URE )

The idea to the question, is that data will most often have to pass not both places of potential corruption, that is it will be stored in memory (see 1) and stored and retrieved from mass storage (see 2).

Is there a way to relate which of the two corruption (via bit rot) is more likely?

The figures I have found are not easily relateable to me and answering the question would consist of doing exactly this.

Figure for each form of bit rot

  1. memory/ram bit flipping: I read "25,000 to 70,000 errors per billion device hours per megabit" from wikipedia/google. Another source plainly says that 1 bitflip error occurs each 3 days (72 hours)
  2. mass storage I read that unrecoverable read error on consumer hard disks occur with "URE is expected for one in 10^14 bits" from https://en.wikipedia.org/wiki/RAID#URE

Now to me that shows that one can easily imagine cases in which the risk any of the two bit rot scenarious is height:

for 1) one could but data in ram and wait very very long, as to make it almost certain corruption occurs. for 2) one could write and read lots of data to storage as to make the amount bit enough, as to make it almost certain corruption occurs there two.

The question hence asks more for the standard scenario of writing data (i.e. fotos to disk and backups). A case in which I expect the ram to be the lesser of the two evil (as for a backup, i.e. copying the ram time is less as relative to the storage space involved).

I even did preparse some estimation, as to allow to answer this question in only confirming the reasoning presented as follows:

scenario I backup 1TB from one disk to another. Assumption is that both memory and storage is involved. For the memory the estimation of bit rot is relative to the time needed. The time needed is assumed to be relative to the write speeds involved of the mass storage (1MB/s USB1 to 100MB/s sata-hdd to 500MB/s sata-ssd are reasonable values, 50 MB/s is chosen).

Then values in the estimation are:

  • 1TB = 10^12 Bytes = 10^15 bits
  • time needed at 50MB/s = 1 TB / (50 MB/s) = 20000 seconds = 5.5 hours
  • storage bit rot for 1TB with rate of 1 bit rot each 10^14 bit = 10 bit rots
  • memory bit rot estimated with (1 bit rot each 72 hours) = 5.5/72 bit rots = 0.07638 bit rots

My assumption then is that if the memory is not broken to start with (i.e. memcheck86 or memtest86 does not tell you that your ram is defective) and only the 5.5.hours are considered, then the most reason to have corruption is introduced by the mass storage.

Consequently mass storage bit rot is more likely (at a backup situation) than non ECC memory bit rot!

I look forward to an confirming or correcting answer, which can shed some light on the relative risks of bit rot in either ram or mass storage

humanityANDpeace

Posted 2015-09-07T08:02:32.497

Reputation: 642

It is more likely you should compare any copied/moved data with some comparison routine, as that covers both ram issues transport issues and storage at that moment. Then have 2 backups (that were compared) for ciritical data. Now compare between the 2 backups :-) Any of the above can happen, but does not seem to happen when everything is going along perfect, it only happens (and then all of it too :-) when things go bad. If i compare how perfect everything works normal to how bad things can get, the real problems are listed in megs and gigs and terra, not bit. – Psycogeek – 2015-09-07T11:39:01.060

@Psycogeek I appreciate your advice, and which hints on ways to avoid the need to answer the question. I am not ignorant to that advice, yet the question I did ask purposefully, and any way to avoid the question in the first place is not the best way to answer the question. Does the suggested estimation seem reasonable to you? – humanityANDpeace – 2015-09-07T12:32:59.600

I just am not getting the same results as the wiki that your getting the math from. But then again i am not in an industrial setting pulling a small cities worth of power, and I use a good regulated UPS, everything is cool, and i am behind a stucco (concrete) and chicken wire cage. Someone could bump my (treated like glass) Hard disks, surge my power, and change my stats 100 times over in a instant, for the most part though fully tested working stuff does not get these kinds of numbers of fail in 2010 and beyond. Do the compares of whole terrabytes and see. – Psycogeek – 2015-09-07T13:34:31.040

No answers