17

I updated this post since I replaced the processor, but the core of my question (and unfortunately the results as well) are the same.


I built my first FreeNAS box and wanted to use ECC RAM since I want to store critical data. Because I am on a budget, I wanted to go for the most affordable solution that would still support ECC RAM.

After doing some research, I found out, that I need a motherboard, memory and a CPU that supports ECC. My motherboard of choice is the "Gigabyte X150M-Pro ECC" which has the C232 chipset, DDR4 and a LGA1151 socket.

I also bought a kit of two DIMMs made by Kingston with the model number "KVR21E15S8K2/8" (spec sheet). Gigabyte published a list of tested memory modules and my modules seem to be supported with working ECC (list of supported modules).

RAM label

Since I am on a budget I needed an affordable Skylake CPU that supports ECC. According to Intel the Celeron G3900 does support ECC, so I went with that one.

After building the computer, I wanted to verify that my system is indeed running with ECC memory and entered the motherboard's BIOS. From various internet sites, I found out that some motherboards have a special section which should tell if ECC is working, but my motherboard doesn't seem to have that. I checked all menus and I couldn't find a similar section.

After doing some more research and found a post over on the Unix&Linux stackexchange which didn't solve my problem. I tried the latest memtest86+ which from what I could tell, doesn't even show the value "ECC". I tried the older 4.20 version that Puget systems used which showed "ECC: off". However after reading the previously mentioned post, I doubt that it tells the truth (maybe that's why the feature was removed?). Both version also didn't read out the correct speed and latency of the DIMM which adds to my doubts towards memtest86+.

memtest86+ screenshot

Another popular way to find out, if ECC is working, was to issue the dmidecode -t memory command and read out the Total Width and Data Width. My results were 128 Bits and 64 Bits respectively. One part of the output showed details about the memory array which had a key-value pair of Error Correction Type: Single-bit ECC.

I was expecting 72 bits for the Total Width, so I thought it might be related to dual channel and moved the memory modules into two adjacent slots which should prevent dual channel, but the result was the same. Here is the full output of dmidecode -t memory.

I even tried out the interesting C-program that Puget systems published, but the result was 0, indicating no ECC support.

Now I am starting to doubt that the data on Intel's own website is correct and my CPU doesn't actually support ECC. Both the memory and the motherboard are specifically branded with "ECC", so I can rule out those.

Is it possible that the BIOS version needs an update (currently there is none) to enable ECC or is ECC actually already working and I was just not able to verify it? Or is my choice of CPU wrong, if I want to run ECC memory and Intel's website is wrong/misleading?

If my CPU turns out to be the wrong choice, what would be the next best choice for a "budget ECC CPU"?

UPDATE: I saw some new indication that my system actually might be running with ECC enabled and the dmidecode tool just reports weird data. Over at the FreeNAS forum the user Dusan is using server grade hardware (SuperMicro MB, Xeon CPU, Kingston DIMM) and has a similar output with 128 Bits. But he wrote that he is not sure himself, if it actually works.

UPDATE 2: As yagmoth555 mentioned in his answer to this question, it seems that my motherboard only supports ECC with Xeon processors, though I thought that note was a relict from previous manuals that got copied over. I guess that means that I need to look into a Xeon processor.. :-/


UPDATE 3: I bought a Xeon E3-1220v5 now which of course supports ECC and should meet the requirement from the manual. I ran all the tests again to check for ECC functionality and the results are basically identical:

ecc_check and dmidecode

From the comments at Puget Systems, it also seems like that the ecc_check.c program doesn't work on Xeon and Core i7 processors.. :-/

I checked out memtest86+ some more this time and I am fairly certain that it doesn't support DDR4 or the C232 chipset at all, since it reports not only the wrong speed and timings but also DDR3 instead of the installed DDR4. However, it detected processor just fine, but I still got the same end result with both versions of memtest86+:

memtest86+ v5.01

Version 4.20 doesn't even detect my processor properly..

memtest86+ v4.20

Any ideas on how else I can test for ECC are very much appreciated.

comfreak
  • 1,451
  • 1
  • 21
  • 32
  • Well, if you machine did not support ECC, it would not start :) – Orphans Oct 20 '16 at 19:01
  • 1
    @Orphans Before I found the motherboard, I saw some cheaper ones that claimed "ECC support" on other chipsets like Z170 etc. Turns out that it just means the board can run (not crash) with ECC memory but will effectively not use it. So I am wondering if my case is one like that? – comfreak Oct 20 '16 at 19:04
  • If ECC usually you see it in the POST section. Can you press ESC during the boot to see the boit screen? – yagmoth555 Oct 20 '16 at 22:31
  • Try also memtest from memtest86.com – citrin Oct 20 '16 at 22:33
  • @comfreak hmm, you are right. – Orphans Oct 21 '16 at 06:04
  • @yagmoth555 by default it only shows the Gigabyte logo screen and ESC doesn't do anything. When I turn that off, I don't get any POST information, just the American Megatrends logo and one line that says "Press DEL or ESC to enter setup" – comfreak Oct 21 '16 at 08:51
  • @citrin as you can read in my post, I already tried that ;-) – comfreak Oct 21 '16 at 08:51

5 Answers5

9

Today I found out that there is a commercial version of memtest86 (without the +) from PassMark that offers a free version too which thankfully included ECC-Checks.

In addition it also supports DDR4 and all the other features of memtest86+.

My result seem to be positive for ECC support, so I will call this done, even though I was hoping to get the same result with "traditional" tools like dmidecode.

memtest86 result


If someone stumbles upon this post at a later point in time and needs further validation and tests, they also offer a paid version that supports ECC error injection for actually testing the ECC capabilities.

comfreak
  • 1,451
  • 1
  • 21
  • 32
4

Edited: Bad new from your motherboard manual... :

enter image description here


I see you run BSD/linux, run that inside the OS; (Available for FreeNAS)

dmidecode -t 17

You should have a output like:

dmidecode 2.12 SMBIOS 2.5 present.

Handle 0x1100, DMI type 17, 28 bytes Memory Device Array Handle: 0x1000 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: DIMM1 Bank Locator: Not Specified Type: DDR2 Type Detail: Synchronous Speed: 667 MHz Manufacturer: AD00000000000000 Serial Number: 00002062 Asset Tag: 010839 Part Number: HYMP125P72CP8-Y5 Rank: 2

The Total Width: 72 bits is the part you are looking for.

On Windows system you can run

wmic MEMORYCHIP get DataWidth,TotalWidth

//ECC Memory DataWidth TotalWidth 64 72

//Non-ECC Memory DataWidth TotalWidth 64 64

Answer for FreeBSD & Windows took from there

yagmoth555
  • 16,300
  • 4
  • 26
  • 48
  • That's basically what I already tried with `dmidecode -t memory` and my result was 128 bits `Total` and 64 bits `Data` – comfreak Oct 21 '16 at 12:33
  • @comfreak dmidecode -t 17 return what ? – yagmoth555 Oct 21 '16 at 12:59
  • I basically get the same output like `dmidecode -t memory`: http://slexy.org/view/s2JimvAzl6 – comfreak Oct 21 '16 at 13:01
  • @comfreak well, 128 make no sense to me. usually it's 64 non ecc or 64b + 8b (72) for ECC (mirror?? 64+64?). but searching 'dmidecode total width 128' list a lot of freenas post... a lot seem to try to detect the ECC with ./ecc_check.py python script – yagmoth555 Oct 21 '16 at 13:10
  • My assumption was that it might be related to dual channel but I even get the same result when I put the two modules into two separate channels, meaning dual channel should not work. – comfreak Oct 21 '16 at 13:13
  • like I wrote in my original post, I tried the second latest version which is also the version I see posted everywhere that should indicate ECC support. It reports `ECC: off` *but* also reports the wrong frequency and latency of the DIMMs (see my post) – comfreak Oct 21 '16 at 13:22
  • Yeah, removed my comment, seen your sentence after. – yagmoth555 Oct 21 '16 at 13:31
  • @comfreak edited my answer, but bad new from gigabyte :| – yagmoth555 Oct 21 '16 at 13:42
  • I did see that note in the manual, but on other parts of the box, it just says that a CPU with ECC support is required, so I thought that it might have been a copy-paste from an older manual. But I guess that means that I will need to get a Xeon processor :-/ Edit: I added an update to my question that mentions your findings. – comfreak Oct 21 '16 at 13:47
  • UPDATE: I updated my post since I bought a Xeon CPU now, in case you want to check it out. – comfreak Oct 24 '16 at 11:54
4

Using a Ryzen 7 processor, none of the mentioned tools worked for me either. However with a recent enough Linux kernel, the tools in edac-utils, edac-ctl and edac-util can read out the ECC status and also things like number of corrected errors. The kernel log will also contain lines with "EDAC" in dmesg, which should give some information as well. This functionality can be further tested by overclocking the RAM and checking that errors are reported (if going high enough), that is about as much proof as you can get that it really works. However even if these tools report errors or do not work, that only means that reading ECC status information is not supported, there seems to be no 100% reliable way to prove that ECC is NOT working...

user415177
  • 41
  • 2
  • 1
    Did you try Passmarks memtest86? (The one I mentioned in my answer) – comfreak May 13 '17 at 11:35
  • 1
    While your output may vary, to check for EDAC information in dmesg you can run `dmesg | grep EDAC` (you may need to run this with root privileges). For example, on a Ryzen-based system with ECC memory installed and enabled in the BIOS (if applicable), you may see a line which looks like `amd64: Node 0: DRAM ECC enabled.` – Joe Feb 26 '20 at 14:58
  • confirmed in Fedora 32 on a supermicro LGA2011v3 board, a Xeon E5-2603v3 processor (literally the cheapest Xeon with ECC at the time), with 4x8GB ECC DDR4 DIMMs installed, that the following works: `sudo dnf -y install edac-utils`, then run `edac-ctl --status` to see if the drivers are loaded. If they aren't, you probably don't have ECC supported and available in the Linux kernel. If you do, then `run edac-util -vv` and it will read out to you all your ECC errors that have been corrected since boot. I would imagine this would work with Ryzen processors (zen2 architecture and up) as well. – Nick Jul 29 '20 at 02:22
  • next time I Take that system down, I will also try the commercial memtest86 binary to see if it's results align with `edac-util`. I should also add that the answer regarding `dmidecode -t memory` also aligns with my findings, in that total width is 72 bits and data width is 64 bits, which is what you'd expect from an ECC module. – Nick Jul 29 '20 at 02:30
1

I have found dmidecode results to be hit or miss, with dmidecode often reporting board "capabilities" having ECC even if non-ECC memory is installed. Similarly,edac-utils also often shows ambiguous results with "no DIMM info":

root@richie:~# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
edac-util: No errors to report.

However, the output of the lshw utility always seems to indicate if ECC is configured and working correctly, even on fringe platforms like the LGA1155 i3-2100 (which is one of few desktop Intel CPUs that does support ECC if all requirements are met):

root@richie:~#root@richie:~# sudo lshw -class memory|grep ecc
           capabilities: ecc
           configuration: errordetection=multi-bit-ecc
0

For non server motherboards and chipsets, only specific AMD motherboards(like ASRock) and any AMD chipsets offer ECC.

For Intel, they only make ECC available on server Xeon chipsets. Intel disables ECC on their desktop chipsets.

d hee
  • 220
  • 3
  • 13
  • 1
    That might be true, but it doesn't answer the actual question that was asked here. – comfreak Sep 25 '18 at 17:07
  • It applies to the op question as he is running a non xeon Intel chip. The answer is he can not check. – d hee Sep 25 '18 at 18:24
  • Just the last sentence of your answer is incorrect, since the C232 chipset for example is a "desktop chipset" and does support ECC. Apart from that, the question is more general, as in how to check, like if you don't know whether it's supported or not. – comfreak Sep 25 '18 at 18:59