0

/var/log/messages is full of this, occurring every second. I can see this has been going on for at least 4 days but any older logs have been purged. Maybe it's always been like this.

Jul  8 04:07:12 webbox1 kernel: ACPI Error: SMBus or IPMI write requires Buffer of length 66, found length 32 (20090903/exfield-286)
Jul  8 04:07:12 webbox1 kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PMI0._PMM] (Node ffff88087468bab0), AE_AML_BUFFER_LIMIT
Jul  8 04:07:12 webbox1 kernel: ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20090903/power_meter-342)

Can someone explain what it means, or where I should look to debug further?

Thanks

Codemonkey
  • 1,034
  • 2
  • 17
  • 36
  • 95% of the time this is a BIOS bug and requires a BIOS update to fix. But to be sure, you should file a bug report at kernel.org. – Michael Hampton Jul 30 '18 at 15:33
  • In that instance would it be safe to somehow silence these messages, so /var/log/messages can be useful again, rather than a multiple-MB behemoth every day? – Codemonkey Jul 30 '18 at 15:54

1 Answers1

1

Looks like this is a common problem with some older HPE servers.

The first thing you should do is update the system BIOS/firmware.

If the updated firmware doesn't resolve the problem, you can work around the problem by disabling the ACPI power meter module, e.g.:

echo "blacklist acpi_power_meter" >> /etc/modprobe.d/hwmon.conf

In pre-3.0 kernels such as that in CentOS 6, the module name was just power_meter:

echo "blacklist power_meter" >> /etc/modprobe.d/hwmon.conf

In theory it's possible to write a custom ACPI table to patch the problem yourself, but that's more than a bit complicated, and it's something the hardware vendor should have fixed anyway...

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
  • This is a dedicated server that I get from oneprovider.com and their management interface says "Don't change BIOS, RAID, IDRAC, ILO configuration!" - I guess I'll send them a ticket and see if they fancy doing it for me... – Codemonkey Jul 30 '18 at 16:04
  • 1
    Aha. Well, they should take care of this immediately or you should get a refund... – Michael Hampton Jul 30 '18 at 16:06
  • Thanks, I've had the box with them for a number of years now and would certainly prefer not to have to move it. I'll open a ticket, thank you. – Codemonkey Jul 30 '18 at 16:13
  • 1
    Oh, well in that case it's probably not their fault it hasn't had a BIOS update in years :) – Michael Hampton Jul 30 '18 at 16:13
  • Unfortunately a newer BIOS didn't make a difference. Are there any risks/downsides to disabling the ACPI power meter like you suggested? Thanks! – Codemonkey Aug 01 '18 at 18:26
  • 1
    @Codemonkey You won't be able to monitor the system's power usage (in watts) from within the OS. If it's a leased server, you might not care much about that though. – Michael Hampton Aug 01 '18 at 18:28
  • I care less than not at all, thanks! :) – Codemonkey Aug 01 '18 at 18:30
  • I ran that command Michael and have checked that the file exists and contains what it should, but my log file is still filling up with these messages. I tried a reboot too, no difference. Sorry... – Codemonkey Aug 01 '18 at 18:38
  • 1
    Hmm. You have a really old kernel; it may have been named `power_meter` and not `acpi_power_meter`. – Michael Hampton Aug 01 '18 at 18:40
  • Will it definitely be coming from hwmon, it couldn't be coming from another service like netdata? I've just tried disabling sensors in netdata and it didn't seem to make a difference - I'll try `power_meter` now. Do I need a reboot to see the effects or is there a service I can restart instead? – Codemonkey Aug 01 '18 at 18:44
  • 1
    @Codemonkey You'll have to reboot to blacklist the module. – Michael Hampton Aug 01 '18 at 18:45
  • Perfect, `power_meter` did the trick, thanks for all your help. – Codemonkey Aug 01 '18 at 18:49