9

We have a server in a colocation centre (a real physical server, not a VM), running 64-bit Debian (uname -r: 3.16.0-4-amd64).

/proc/meminforeports about 4 GiB total memory:

$ head -n 1 /proc/meminfo
MemTotal:        4051692 kB

free reports the same (I only looked at the total column; I'm not talking about used, free, shared, buffers, cached):

$ free -k
             total       used       free     shared    buffers     cached
Mem:       4051692    3867356     184336     220908      63948    1203596
-/+ buffers/cache:    2599812    1451880
Swap:     15728208     652540   15075668

And so does dmesg | grep Memory:

$ dmesg | grep Memory
[    0.000000] Memory: 4034240K/4185236K available (5287K kernel code, 949K rwdata, 1836K rodata, 1208K init, 840K bss, 150996K reserved)

But dmidecode reports 4 * 2 GiB = 8 GiB of RAM, if I understand it correctly:

$ sudo dmidecode --type memory

# dmidecode 2.12
SMBIOS 2.6 present.

Handle 0x0008, DMI type 5, 24 bytes
Memory Controller Information
    Error Detecting Method: 64-bit ECC
    Error Correcting Capabilities:
        Single-bit Error Correcting
    Supported Interleave: One-way Interleave
    Current Interleave: One-way Interleave
    Maximum Memory Module Size: 4096 MB
    Maximum Total Memory Size: 16384 MB
    Supported Speeds:
        Other
    Supported Memory Types:
        DIMM
        SDRAM
    Memory Module Voltage: 3.3 V
    Associated Memory Slots: 4
        0x0009
        0x000A
        0x000B
        0x000C
    Enabled Error Correcting Capabilities:
        Single-bit Error Correcting

Handle 0x0009, DMI type 6, 12 bytes
Memory Module Information
    Socket Designation: DIMM1A
    Bank Connections: 0 1
    Current Speed: Unknown
    Type: DIMM SDRAM
    Installed Size: 2048 MB (Single-bank Connection)
    Enabled Size: 2048 MB (Single-bank Connection)
    Error Status: OK

Handle 0x000A, DMI type 6, 12 bytes
Memory Module Information
    Socket Designation: DIMM1B
    Bank Connections: 2 3
    Current Speed: Unknown
    Type: DIMM SDRAM
    Installed Size: 2048 MB (Single-bank Connection)
    Enabled Size: 2048 MB (Single-bank Connection)
    Error Status: OK

Handle 0x000B, DMI type 6, 12 bytes
Memory Module Information
    Socket Designation: DIMM2A
    Bank Connections: 4 5
    Current Speed: Unknown
    Type: DIMM SDRAM
    Installed Size: 2048 MB (Single-bank Connection)
    Enabled Size: 2048 MB (Single-bank Connection)
    Error Status: OK

Handle 0x000C, DMI type 6, 12 bytes
Memory Module Information
    Socket Designation: DIMM2B
    Bank Connections: 6 7
    Current Speed: Unknown
    Type: DIMM SDRAM
    Installed Size: 2048 MB (Single-bank Connection)
    Enabled Size: 2048 MB (Single-bank Connection)
    Error Status: OK

Handle 0x002A, DMI type 16, 15 bytes
Physical Memory Array
    Location: System Board Or Motherboard
    Use: System Memory
    Error Correction Type: Single-bit ECC
    Maximum Capacity: 16 GB
    Error Information Handle: Not Provided
    Number Of Devices: 4

Handle 0x002C, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x002A
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: None
    Locator: DIMM1A
    Bank Locator: BANK0
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1333 MHz
    Manufacturer: Micron        
    Serial Number: 501C6FDC
    Asset Tag: AssetTagNum0
    Part Number: 9JSF25672AZ-1G4D1 
    Rank: Unknown

Handle 0x002E, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x002A
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: None
    Locator: DIMM1B
    Bank Locator: BANK1
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1333 MHz
    Manufacturer: Micron        
    Serial Number: 2A1C6FDC
    Asset Tag: AssetTagNum1
    Part Number: 9JSF25672AZ-1G4D1 
    Rank: Unknown

Handle 0x0030, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x002A
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: None
    Locator: DIMM2A
    Bank Locator: BANK2
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1333 MHz
    Manufacturer: Micron        
    Serial Number: 511C6FDC
    Asset Tag: AssetTagNum2
    Part Number: 9JSF25672AZ-1G4D1 
    Rank: Unknown

Handle 0x0032, DMI type 17, 28 bytes
Memory Device
    Array Handle: 0x002A
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 2048 MB
    Form Factor: DIMM
    Set: None
    Locator: DIMM2B
    Bank Locator: BANK3
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1333 MHz
    Manufacturer: Micron        
    Serial Number: 4B1C6FDC
    Asset Tag: AssetTagNum3
    Part Number: 9JSF25672AZ-1G4D1 
    Rank: Unknown

What am I missing? It's a server in a colocation center, so unfortunately I can't easily see what is installed physically.

Edit: man dmidecode says "More often than not, information contained in the DMI tables is inaccurate, incomplete or simply wrong.". Maybe dmidecode simply reports wrong data?

Edit: This is not a duplicate of Why is Linux reporting “free” memory strangely?. That question is about free memory, and confusion arising from buffers and cache taking away from that. I am not concerned about free memory, only about total memory. Don't let my usage of the free command fool you: I didn't use it to look at the amount of free memory, only the amount of total memory. If someone still thinks this question is a duplicate, please explain me why because I don't understand.

Edit: dmidecode -t1 as requested by Lenniey

$ sudo dmidecode -t1
# dmidecode 2.12
SMBIOS 2.6 present.

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: Supermicro
        Product Name: X8SIL
        Version: 0123456789
        Serial Number: 0123456789
        UUID: 49434D53-0200-9037-2500-379025009946
        Wake-up Type: Power Switch
        SKU Number: To Be Filled By O.E.M.
        Family: To Be Filled By O.E.M.
Roel Schroeven
  • 191
  • 1
  • 5
  • Is this really a physical server? Or a VM of some kind? – Lenniey Dec 20 '17 at 10:34
  • 2
    Yes, certainly looks like 4x2GiB DIMMs. What distribution is it running (in particular, is it a 32-bit or 64-bit kernel)? – mjturner Dec 20 '17 at 10:34
  • see canonical question https://serverfault.com/questions/449296/why-is-linux-reporting-free-memory-strangely – Sum1sAdmin Dec 20 '17 at 10:54
  • 1
    Possible duplicate of [Why is Linux reporting "free" memory strangely?](https://serverfault.com/questions/449296/why-is-linux-reporting-free-memory-strangely) – Sum1sAdmin Dec 20 '17 at 10:54
  • It's a physical server, and runs 64 bit Debian. Edited to add that information. – Roel Schroeven Dec 20 '17 at 11:08
  • Not a duplicate of [Why is Linux reporting “free” memory strangely?](https://serverfault.com/questions/449296/why-is-linux-reporting-free-memory-strangely); edit question to explain why. – Roel Schroeven Dec 20 '17 at 11:09
  • Roel, have a read of the question linked - it is a duplicate, free memory means free memory - the total is the total of free memory, look at vmstat, top etc. – Sum1sAdmin Dec 20 '17 at 11:18
  • 1
    Possible things to check are `/proc/cmdline` for a [`mem=`](https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html) boot parameter that might restrict memory, or if you have a multi CPU socket system where not all sockets are occupied; then the memory may be installed in memory banks reserved for the missing CPU, or your BIOS has been configured for [memory mirroring](https://serverfault.com/q/243373/37681). – HBruijn Dec 20 '17 at 11:27
  • 1
    @Sum1sAdmin: Sorry, I don't understand. That linked question talks about free memory, buffers, cache. That is not what I'm talking about. I'm talking about the very first number in the output of `free`, which shows total amount of memory in the system, not free memory. Or to put it in other words: `head -n 1 /proc/meminfo` gives `MemTotal: 4051692 kB`. Has nothing to do with free memory or memory consumed by buffers and cache. – Roel Schroeven Dec 20 '17 at 11:32
  • @HBruijn: Thanks. (1) There's no `mem=` boot parameter. (2) There's only one CPU socket (with a 4-core Xeon). (3) I would be surprised if the motherboard supports memory mirroring. I don't suppose there's a way to find out without physical access? – Roel Schroeven Dec 20 '17 at 12:40
  • Whats the output of `dmesg | grep Memory`? 4GB, presumably. Your OS doesn't get "more" from the underlying systems, like BIOS etc. What's the output of `dmidecode -t1`? – Lenniey Dec 20 '17 at 14:07
  • @Lenniey: edited to add that information. Doesn't tell me anything useful really: neither seems to report information about installed/found memory. – Roel Schroeven Dec 20 '17 at 14:26
  • @RoelSchroeven Should have been something like: `[ 0.000000] Memory: 99183044k/101711872k available (4556k kernel code, 1058052k absent, 1470776k reserved, 7590k data, 1360k init)` – Lenniey Dec 20 '17 at 14:34
  • @Lenniey: I can see that line on some other boxes, but not on that one. I suppose the oldest contents of the kernel ring buffer are lost. I suppose I could reboot the machine, but I'd rather only do that if really necessary. – Roel Schroeven Dec 20 '17 at 14:42
  • @Lenniey: `dmesg | grep Memory` is available now (I rebooted the machine because an update installed a new kernel): `[ 0.000000] Memory: 4034240K/4185236K available (5287K kernel code, 949K rwdata, 1836K rodata, 1208K init, 840K bss, 150996K reserved)` – Roel Schroeven Dec 21 '17 at 09:59
  • `man dmidecode` says "More often than not, information contained in the DMI tables is inaccurate, incomplete or simply wrong.". Maybe dmidecode simply reports wrong data? – Roel Schroeven Dec 21 '17 at 10:01
  • @RoelSchroeven Well, I have never seen this, but I also very, very rarely check my RAM with dmidecode. – Lenniey Dec 21 '17 at 10:12
  • I'll probably go to the datacenter somewhere in January; I'll have a look at the hardware. Looks like that's the only way to know for sure. – Roel Schroeven Dec 21 '17 at 10:17
  • @RoelSchroeven While you are at it also look what the BIOS says during boot. If there are more than four memory slots on the motherboard it might be worthwhile investigating if you are supposed to leave a specific subset of them empty. Finally if this is an older board consider the possibility that you may have exceeded the amount of memory supported by the board. – kasperd Dec 21 '17 at 10:41
  • 1
    @kasperd that's why I asked for `dmidecode -t1`, the board supports 32GB ECC / 16GB unbuffered with 4 slots. Strange... – Lenniey Dec 21 '17 at 12:14

3 Answers3

1

The manual for that Supermicro X8SIL motherboard is available at: http://www.supermicro.com/manuals/motherboard/3420/MNL-1130.pdf

On page 32 (aka 2-10) it indicates that if Unbuffered (UDIMM) single-rank memory is used, the maximum amount of memory supported will be only 4 GB when using 1 GB DIMMs and 8 GB with 2 GB DIMMs.

With dual-rank UDIMMs, the maximum capacity would be 16 GB.

The ultimate maximum capacity of 32 GB can only be reached by using Registered (RDIMM) quad-rank memory modules, and the memory bus speed will take a hit when using them.

And a bit of googling on "Micron 9JSF25672AZ-1G4D1" brought me here: https://www.compuram.biz/memory_module/mt9jsf25672az-1g4d1/micron.htm

It seems to confirm that Micron Technology (MT) 9JSF25672AZ-1G4D1 is indeed an unbuffered single-rank memory module of size 2 GB.

On page 34 (or 2-13) of the manual it indicates that when only 4 GB of RAM is used, a significant amount of it will be allocated to system devices and won't be usable. This might account for at least some of the missing memory with 8 GB installed too. Unfortunately the manual doesn't describe in detail the system device allocations in the 8 GB case.

telcoM
  • 4,153
  • 12
  • 23
0

It turns out one of the memory modules is faulty, causing the system to ignore that one and the other one in the same channel.

We now have the server in our office, and we see that there really are 4 DIMM modules present, each 2 GB. I've tested the modules one-by-one, and found out that there is a faulty one.

(That doesn't explain why dmidecode shows "Error Status: OK" for all four modules. I suppose that can be explained by the quote "More often than not, information contained in the DMI tables is inaccurate, incomplete or simply wrong." in the man page)

Roel Schroeven
  • 191
  • 1
  • 5
0

@roel-schroeven:The difference is as a result of what the various commands look at.

Commands like "free" look at the system memory, as reported by the OS (kernel).

The "dmidecode" command looks at the system hardware DMI table, as reported by the SMBIOS driver from the system BIOS - see previous question about DMI origin.

As such, dmidecode will show what hardware is installed, but not what is necessarily in use by the OS. This can be misleading as it may include, for example, expansion cards that have no OS driver, but are nevertheless present in the hardware list. Also, since it shows upgrade options, that can further confuse the situation.

The BIOS list is not able to always show the correct information (the "inaccurate, incomplete or simply wrong" disclaimer you mention), as with RAM for example, depending on what is setup and the type of error correction available, the BIOS will not be able to detect extended information like errors. In an ideal World, that should result in info like "Error Status: Not Available", but it doesn't (yet).

sarlacii
  • 184
  • 7
  • Yes, but that still leaves two questions. (1) What can cause installed RAM not to be used? RAM doesn't need a driver. One possibility: the RAM is faulty. Maybe others, I don't know. (2) (more an observation than a question) "the BIOS will not be able to detect extended information like errors" Part of the system knows that 4 modules are installed (as evidenced by dmidecode) and part of the system knows that only 2 modules can be used (as evidenced by free() and display in the BIOS) because there is a problem with the other 2. But apparently there is no way to retrieve that information. – Roel Schroeven Oct 03 '19 at 19:17
  • You're correct with (1) the RAM is faulty. Detecting that the slot is in use is different to it actually working. I have had RAM that was not seated correctly, after the server was shipped to the data centre - too many bumps. The system detects that the slot is occupied, but does not use it. I had to remove, reseat, and then it was fine. There could also be an electronic failure, which will prevent use, but not detection. Try remove and reseat, else discard. Go well! – sarlacii Oct 04 '19 at 05:41