0

I am building a high-spec workstation on a X99 chipset, and I found that 64GB of DDR4 (non-ECC) RAM is quite affordable.

This got me wondering, because data integrity in my workloads are important; I specifically wanted to ask a question about what the expected frequency of corruption and memory errors might be, and at what sort of capacity level does ECC memory start to make a lot of sense.

We are balancing various variables here.

  • system stability/data integrity/data corruption rate (affected by not only the type, but the quantity/density of the RAM)
  • cost
  • speed

Different ways to configure things would include:

  1. No ECC, high-end i7 CPU, overclock it, somewhat faster RAM as well. This is cheaper.

  2. Xeon CPU, no OC allowed, ECC ram supported, many more options (reg/buff'd) for RAM available as well, and much higher capacities of RAM are also possible. More expensive.

This is somewhat related to this question but I wanted to ask a more specific question about how I should be looking to balance these factors because sometimes having more speed for significantly less cost, but with slightly reduced data integrity guarantees can still be a win, especially in a situation where it's not really clear which side of the server/workstation line we're on.

You also have factors such as, with a non-ECC "consumer" type system you can "screen" them for stability by memtesting them, which can be a significant time investment. The cost of this downtime and the effort involved also should be factored in.

Steven Lu
  • 228
  • 1
  • 4
  • 12
  • fwiw anecdotally I have an x99 workstation with an i7-5820K and 64GB of non-ECC DDR4 which I use extensively, all day, with a heap of virtual machines loaded all day, and have had no issues at all. – Mark Henderson Feb 04 '16 at 03:34
  • @MarkHenderson Yes I have just started to build a similar machine, 5820K as well, and with 4x16GB sticks I leave open the possibility of eventually loading 128GB into it. I do wish to have the ability to run many VMs, and I also have some computational applications which can make use of multiple tens of GB of RAM. I am just worried that 128GB will lead to issues of a statistical or cosmic nature and maybe it would be worthwhile to have ECC. – Steven Lu Feb 04 '16 at 03:36

1 Answers1

1

I want to leave this note as a comment, but I think my concern has been sufficiently (like 95%) alleviated, so I'm answering myself.

the comments to this answer show some anecdotes where people find a very low rate of data errors. A rate that is certainly low enough for me to tolerate, as I do not require some kind of large-number-of-nines uptime guarantee. At any rate the most important of data must always be checksummed and stored in multiple independent copies, anyway.

But looking at this issue/concern further, I think it may well be that I'm asking a question that is sort of unknowable, the sort of information that would require data gathering and testing on a scale that has never actually been accomplished. Considering that the majority of people do NOT run memtest86 for at least 3 days straight, like I intend to do, and the majority of people still never encounter errors, to worry about this further is extremely likely to be just wasting my time.

Steven Lu
  • 228
  • 1
  • 4
  • 12