0

As title says, on one of my BL460, i have a RedHat installed, and a recurrent message in /var/log/messages from mcelog deamon, telling me:

mcelog: Corrected memory errors on page 61a5dd000 exceed threshold 10 in 24h: 10 in 24h mcelog: Location SOCKET:1 CHANNEL:1 DIMM:0 [] mcelog: Offlining page 61a5dd000 mcelog: Offlining page 61a5dd000 failed: Input/output error

I have two questions:

  1. Is the message is "normal", i mean the system see errors, correct them, and then after all corrections I shouldn't have those errors anymore in /var/log/messages ? (even tho it means some dimm module has some errors)

  2. I try to locate the DIMM module, but i don't find it. I located the PROC 1 of BL, and the CHANNEL 1 pair. But in BL460, DIMM or listed as 1 to 6 . I assumed DIMM:0 was the physical DIMM 1, but after removing it but the message still appears in /var/log/messages. (then I removed 1 and 2 after to check because both are CHANNEL1, but still same) How can I understand which physical DIMM it is ?

Thank you :)

ewwhite
  • 194,921
  • 91
  • 434
  • 799
drkmkzs
  • 191
  • 1
  • 1
  • 7

1 Answers1

0

This is a case where you should have the HPE management agents installed. I don't use mcelog on proper HPE server equipment.

See: HP ProLiant DL380e Gen8 server - SPP use

For RHEL/CentOS, these drivers manage system health and reporting to the OS. Granted, you can also get this information directly from the ILO.

Example output:

hpasmcli> show dimm
DIMM Configuration
------------------
Processor #:                     1
Module #:                     1
Present:                      Yes
Form Factor:                  9h
Memory Type:                  DDR3(18h)
Size:                         8192 MB
Speed:                        1866 MHz
Supports Lock Step:           No
Configured for Lock Step:     No
Status:                       Ok

Processor #:                     1
Module #:                     4
Present:                      Yes
Form Factor:                  9h
Memory Type:                  DDR3(18h)
Size:                         8192 MB
Speed:                        1866 MHz
Supports Lock Step:           No
Configured for Lock Step:     No
Status:                       Ok

Or via ILO...

enter image description here

ewwhite
  • 194,921
  • 91
  • 434
  • 799