0

Until recent my Intel RAID controller (SROMBSASMR) had a monthly occurrence of "BBU disabled: changing WB logical drives to WT", followed about 2 1/2 hours later by "Battery relearn complete".

Since a little over 2 weeks "BBU disabled" started appearing outside of this cycle in a steady pattern each 2 or 3 days*

I'm wondering what this means. Should I replace the battery? Is the controller about to fail?

For the record: I do know what the BBU disabled and relearn messages in themselves mean.

*to be precise 3 times spaced apart by 2 and 3 days, this cycle turn repeated every 8 days. I expect the next occurrence tomorrow in the early afternoon, roughly 2PM.

Erwin Blonk
  • 151
  • 2
  • 4
  • 14

2 Answers2

1

Doesn't sound real healthy. The monthly battery re-learnings are probably OK (although a server slaughtering performance at inopportune moments isn't a real win) but if it's doing it more often, that suggests that the battery is getting flaky.

womble
  • 95,029
  • 29
  • 173
  • 228
  • It is where my thoughts are going. The server and all the hardware is a little over a year old and is in production for 9 months. I have no idea about the lifespan of the battery and it could have been non-optimal to begin with. The thing is that this server cannot, for practical reasons, easily be scheduled for downtime. Replacing a battery just to be sure isn't an option, so I want to be as sure as possible this is it. – Erwin Blonk Aug 02 '12 at 12:22
  • @EBV2010: what do you prefer: planned downtime with no data loss (but a remote possibility that the problem is not fixed), or unplanned downtime with potential data loss? – Mat Aug 02 '12 at 12:52
  • @Mat: It's only a *risk* of unplanned downtime, but a *guaranteed* planned downtime. – womble Aug 02 '12 at 12:57
  • @Mat it also is not a question of downtime with data loss but only of reduced I/O performance due to disabled write-back caches. – the-wabbit Aug 02 '12 at 13:02
  • @syneticon-dj: I disagree with that. If those checks come closer and closer and the controller doesn't completely disable the write cache, you could be in for surprises. – Mat Aug 02 '12 at 13:04
1

Well, the official Intel RAID Smart Battery AXXRSBBU3 technical product specification says “Intel recommends replacing the battery yearly”, so the battery getting bad after an year is possible (especially if the battery is really older and was sitting on the shelf for some time before assembling the server — Li-Ion batteries lose capacity with time even when they are not in use).

You can try to get more information about the battery state: download the Command Line Tool appropriate for your OS from Intel Download Center, then run the following command:

CmdTool2 -AdpBbuCmd –aALL

It should output lots of information about the battery state (however, the detail level probably depends on the controller model). One thing you should check, in addition to obvious things like date of manufacture and “Full Charge Capacity” (measured during battery relearn cycles) compared to “Design Capacity” (which the brand new battery should have), is the battery temperature — although the specified operating range is up to 45°C, running at the temperature close to that maximum greatly shortens the battery lifetime.

You may also be able to obtain at least parts of the detailed battery information from GUI management utilities you might have already installed on the server.

Sergey Vlasov
  • 6,088
  • 1
  • 19
  • 30
  • All signs seems to be green, CmdTool2 basically says everything is ok, fully charged, 97% capacity. The manufacture date says 1/10/2011 but that could mean Jan. 10th or Oct. 1st. Seeing the time this server has been in around (before my time) I say Jan. 10th. Still, I'll keep an eye on it and schedule downtime at some point to deal with some other issues as well. – Erwin Blonk Aug 02 '12 at 14:42