I have a linux server that has logged the following mcelog error:
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 20
MISC 800000
TIME 1476167381 Tue Oct 11 06:29:41 2016
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCA: BUS error: 0 0 Level-3 Generic Generic Other-transaction
Request-did-not-timeout
QPI:
Intel QPI physical layer detected a QPI in-band reset but aborted
initialization
STATUS 8800004000200e0f MCGSTATUS 0
MCGCAP 7000c16 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 63
I can find reference to this error in Intel system programming docs, and monitoring code on github, but nothing explaining the cause, effect and suggested actions. I have read through the latest microcode update notes to see if it's mentioned but can't find anything.
The error might be a 'cosmic radiation-type' one-off or a 'non-event' to ignore, but can anyone elaborate with some real world System Admin-level guidance?
Thanks