1

I'm monitoring Sun hardware using SNMP to gather information from the LOM card. One of the datapoints I'm monitoring is a voltage status MIB which alerts if any of the internal voltages becomes too high or low - with the thresholds being set, presumably, by Sun at the time of manufacture. These trigger with surprising frequency - is this anything to worry about?

Jon Topper
  • 802
  • 9
  • 17
  • What model of hardware is this on? Additionally, is the LOM logging the errors in addition to your SNMP monitoring in the internal event log? – Luke Jul 06 '09 at 15:22
  • These are SunFire X2100 and X2200 machines. Nothing logged in the event log on these LOMs. – Jon Topper Jul 06 '09 at 17:26

3 Answers3

1

We experienced the same thing. Apparently it is nothing to worry about. The voltages vary as the on-board battery backup charges and discharges.

A word about Sun LOM's, the firmware on some models has been updated monthly for years on end. Obviously they update them for a reason. If you haven't already, it might pay you to download the latest and greatest.

kmarsh
  • 3,103
  • 15
  • 22
  • Which on-board battery are we talking about here? – Jon Topper Jul 06 '09 at 17:27
  • Sorry, I was thinking of the Sun StorEdge. Most LOM's just have a small watch type battery. The firmware updates are still fast and furious and the snmp voltage alerts spurious, though. – kmarsh Jul 07 '09 at 01:49
1

I had to manage a bunch of Sun XFire servers and ran into similar issues. When purchased, these servers had the embedded Lights out Manager (eLOM), and after upgrading to integrated Lights out Manager (iLOM), many of these issues were resolved, and much functionality was added. The naming differences are confusing, but I would definitely reccommend upgrading to the latest and greatest firmware for LOM and BIOS if you can afford the downtime.

Chris
  • 131
  • 1
1

There are multiple thresholds within the system, so it may be worth investigating which thresholds you're seeing exceeded. Looking at a status dump from a X4200, I see upper/lower_noncritical_threshold, upper/lower_critical_threshold, and upper/lower_nonrecov_threshold. Given the LOM not logging the error, I'd suspect you may be looking at a query which is showing things hitting the noncritical limits. For transients into that range, it's probably not anything to worry about.

I'll also second the other answers about firmware updates. There have been significant ILOM and BIOS updates since the X4x00s came out, and I'd suspect the X2x00s have updates as well.

Jeremy M
  • 819
  • 4
  • 10