0

I have a SuperMicro server in production with 8x 16GB dual rank DDR3 ECC registered sticks installed (128GB). Had a bunch of ECC errors on one stick one day that caused issues for a few minutes plus the odd one every now and then on a different stick that doesn't seem to cause any issues. Figured we'd swap both out.

Using dmidecode, I determined the model in use to make sure I ordered the exact same brand/model (Micron 36JSF2G72PZ-1G6E1). After installation, booted into the OS and when I checked the memory, it said I only had 80GB (5x 16GB), but dmidecode still showed 8x 16GB sticks. One thing I did notice was that one of the sticks we put in had a slightly different model number - 36JSF2G72PZ-1G6D1 - which seems to be a 1.5V module while the others are 1.35V. Figuring this was the cause of the problem, we pulled that one out and put one of the problematic sticks back in (unfortunately not knowing which one it was). This time, checked the BIOS before booting up and it said 48GB.

Being that it was 2am and everytime we tried this, we had to pull the server completely out of the rack, change a DIMM and put it back into the rack to test (for a number of reasons - don't ask), we just decided to put the original sticks back into the server and put it back the way it was. However, it still shows 80GB in the BIOS and OS (CentOS 7), while dmidecode still shows all eight 16GB DIMMs. This is the way we've left it. After checking historical monitoring graphs, I now see that this server was only ever showing 112GB (ie 7x 16GB) reported by the OS.

We have two other servers in production that are the same model with the exact same spec and memory modules, installed in the same way, and all have the same BIOS, which is the latest made available by SuperMicro. Both of these servers show the full 128GB.

I'm just trying to get a handle on what could be the problem. I can't understand why dmidecode can see memory that the BIOS can't.

I'm leaning towards more faulty memory causing issues and I do have access to another 8 sticks of the same memory in a decommissioned server that I could do a full swap, but before I organise another disruptive early hours of the morning maintenance window that's an hour's drive away for my support guy and just apply more guesswork, I was wondering if there's anything else I can do to troubleshoot this, or any other likely cause of the issue.

Here's the dmidecode output for fun:

# dmidecode 3.1
Getting SMBIOS data from sysfs.
SMBIOS 2.6 present.

Handle 0x0030, DMI type 16, 15 bytes
Physical Memory Array
    Location: System Board Or Motherboard
    Use: System Memory
    Error Correction Type: Multi-bit ECC
    Maximum Capacity: 144 GB
    Error Information Handle: Not Provided
    Number Of Devices: 9

Handle 0x0032, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM1C
    Bank Locator: BANK0
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer00
    Serial Number: SerNum00
    Asset Tag: AssetTagNum0
    Part Number: ModulePartNumber00
    Rank: Unknown

Handle 0x0034, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM1B
    Bank Locator: BANK1
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: D6CA7CD7
    Asset Tag: AssetTagNum1
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x0036, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM1A
    Bank Locator: BANK2
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: B3CA7CD7
    Asset Tag: AssetTagNum2
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x0038, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM2C
    Bank Locator: BANK3
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer03
    Serial Number: SerNum03
    Asset Tag: AssetTagNum3
    Part Number: ModulePartNumber03
    Rank: Unknown

Handle 0x003A, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM2B
    Bank Locator: BANK4
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: 35CB7CD7
    Asset Tag: AssetTagNum4
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x003C, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM2A
    Bank Locator: BANK5
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: CCCA7CD7
    Asset Tag: AssetTagNum5
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x003E, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM3C
    Bank Locator: BANK6
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer06
    Serial Number: SerNum06
    Asset Tag: AssetTagNum6
    Part Number: ModulePartNumber06
    Rank: Unknown

Handle 0x0040, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM3B
    Bank Locator: BANK7
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer07
    Serial Number: SerNum07
    Asset Tag: AssetTagNum7
    Part Number: ModulePartNumber07
    Rank: Unknown

Handle 0x0042, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0030
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P1_DIMM3A
    Bank Locator: BANK8
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer08
    Serial Number: SerNum08
    Asset Tag: AssetTagNum8
    Part Number: ModulePartNumber08
    Rank: Unknown

Handle 0x0044, DMI type 16, 15 bytes
Physical Memory Array
    Location: System Board Or Motherboard
    Use: System Memory
    Error Correction Type: Multi-bit ECC
    Maximum Capacity: 144 GB
    Error Information Handle: Not Provided
    Number Of Devices: 9

Handle 0x0046, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM1C
    Bank Locator: BANK9
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer09
    Serial Number: SerNum09
    Asset Tag: AssetTagNum9
    Part Number: ModulePartNumber09
    Rank: Unknown

Handle 0x0048, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM1B
    Bank Locator: BANK10
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: D1CA7CD7
    Asset Tag: AssetTagNum10
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x004A, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM1A
    Bank Locator: BANK11
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: 34CB7CD7
    Asset Tag: AssetTagNum11
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x004C, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM2C
    Bank Locator: BANK12
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer12
    Serial Number: SerNum12
    Asset Tag: AssetTagNum12
    Part Number: ModulePartNumber12
    Rank: Unknown

Handle 0x004E, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM2B
    Bank Locator: BANK13
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: 9ECA7CD7
    Asset Tag: AssetTagNum13
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x0050, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM2A
    Bank Locator: BANK14
    Type: DDR3
    Type Detail: Other
    Speed: 1333 MT/s
    Manufacturer: Micron        
    Serial Number: 9FCA7CD7
    Asset Tag: AssetTagNum14
    Part Number: 36JSF2G72PZ-1G6E1 
    Rank: Unknown

Handle 0x0052, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM3C
    Bank Locator: BANK15
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer15
    Serial Number: SerNum15
    Asset Tag: AssetTagNum15
    Part Number: ModulePartNumber15
    Rank: Unknown

Handle 0x0054, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM3B
    Bank Locator: BANK16
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer16
    Serial Number: SerNum16
    Asset Tag: AssetTagNum16
    Part Number: ModulePartNumber16
    Rank: Unknown

Handle 0x0056, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0044
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: No Module Installed
    Form Factor: DIMM
    Set: None
    Locator: P2_DIMM3A
    Bank Locator: BANK17
    Type: DDR3
    Type Detail: Other
    Speed: Unknown
    Manufacturer: Manufacturer17
    Serial Number: SerNum17
    Asset Tag: AssetTagNum17
    Part Number: ModulePartNumber17
    Rank: Unknown

Handle 0x0058, DMI type 16, 15 bytes
Physical Memory Array
    Location: System Board Or Motherboard
    Use: Flash Memory
    Error Correction Type: None
    Maximum Capacity: 4 MB
    Error Information Handle: Not Provided
    Number Of Devices: 1

Handle 0x005A, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x0058
    Error Information Handle: Not Provided
    Total Width: 8 bits
    Data Width: 8 bits
    Size: 4096 kB
    Form Factor: Other
    Set: None
    Locator: BIOS
    Bank Locator: ROM0
    Type: Flash
    Type Detail: Non-Volatile
    Speed: 33 MT/s
    Manufacturer: ATMEL       
    Serial Number:  
    Asset Tag:  
    Part Number: 26DF321             
    Rank: Unknown
  • `dmidecode` shows whatever is set by the BIOS in the DMI area. What it puts there depends entirely on the some BIOS function that enumerates hardware and fills the table; probably it just put there all the RAM SPD chip readings. That, obviously, includes the readings that were obtained from all the modules, including those were not made available to the system. Probably this is not the same function that actually tests how much memory is available has an access to the DRAM controller configuration details. In short: only SuperMicro could know why is that. – Nikita Kipriyanov Sep 22 '22 at 10:01

0 Answers0