3

SCOM supports putting discreet objects/classes/targets into maintenance mode. This gives a very fine control over what objects/classes/targets have alerts forwarded or not.

Unfortunately, behaviorally, our operations team doesn't want that level of control.

Behaviorally they want to put the entire server, or groups of servers, into maintenance mode. Where "maintenance mode" means no alerting of any kind. Period. Fin.

Today, we come close, by putting WindowsComputer and HealthService (which also seems to cover Agent). Putting those objects into maintenance mode allows us to do application deployments (service stops, etc) and anything requiring a reboot.

However, we still get occasional alerts from objects in either like the Dell MP or BizTalk MP. Alerts that don't tend to target WindowsComputer, or anything in its inheritance chain(?).

We tried putting Entity object/class/target into maintenance mode but this seemed to send the RMS server into a tizzy. e.g. If we made 50 requests, for 50 different servers, maybe 1 in 5 would actually be placed into maintenance mode. The remainder would be ignored.

We are using the SCOM API via Power Shell, or the SCOM SDK object model, to put things into maintenance mode.

Is there a recommended way to put a server, and all its contained objects, into maintenance mode, reliably?

Is there something our team should be considering on why we don't want to put everything into maintenance mode?

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Zach Bonham
  • 210
  • 1
  • 3
  • 10

2 Answers2

3

According to the documentation, you can easily place a whole server in maintenance mode:

  1. In the Operations console, click the Monitoring button.
  2. In the Monitoring pane, expand Monitoring, and then click Computers.
  3. In the Computers pane, right-click the computer that you want to place into maintenance mode, click Maintenance Mode, and then click Start Maintenance Mode. You can use ctrl+click or shift+click to select multiple computers to place into maintenance mode.
  4. In the Maintenance Mode Settings dialog box, under Apply to, click Selected objects only if only the computer is to be placed into maintenance mode; otherwise, click Selected objects and all their contained objects.
Massimo
  • 68,714
  • 56
  • 196
  • 319
  • For the most part, this seems to work for rules/monitors targeting WindowsComputer or some derivative. This doesn't work as we expected if the machine down is for any significant length of time: reboot, hardware maintenance, etc. We get Health Service Watcher Alerts because RMS can't talk to the agent (or something similar) because HealthService is not part of the WindowsComputer inheritance chain and at least Dell MP and BizTalk MP also have several objects not inheriting from WindowsComputer. Just to throw in a wrench, we often use the API (from admin/deploy scripts) :) – Zach Bonham Oct 03 '12 at 17:21
  • I'll try your suggestion though as if we use the console, we are using views specifically created for admin teams. – Zach Bonham Oct 03 '12 at 17:24
  • *facepalm* this works as advertised with 2007 R2. Testing bore this out. We will be further investigating the reports of this not working to see if we can narrow down to a repeatable scenario. – Zach Bonham Oct 06 '12 at 17:40
2

This article might help clarify a few things:

http://blogs.technet.com/b/momteam/archive/2012/05/23/kb-understanding-operations-manager-maintenance-mode.aspx

Is there a recommended way to put a server, and all its contained objects, into maintenance mode, reliably?

Putting the computer object into maintenance mode should work.

Since SCOM 2007 R2 there is no need to separately put the agent and agent watcher into maintenance mode. Just be sure to check the 'Selected Objects and all their contained objects' option if using the console, or the TraversalDepth.Recursive if using the SDK (the PowerShell cmdlet does this by default).

However, we still get occasional alerts from objects in either like the Dell MP or BizTalk MP. Alerts that don't tend to target WindowsComputer, or anything in its inheritance chain(?).

You could try to identify the top-level distributed applications (DAs) or groups that contain the objects raising the alerts, and put those DAs and groups into maintenance mode.

Is there something our team should be considering on why we don't want to put everything into maintenance mode?

Consider:

  1. Putting everything into maintenance mode can take a long time
  2. Putting the RMS into maintenance mode is normally a Bad Thing - 'Configuration distribution, the heartbeat feature, and other features for the system might become unreliable' (see article above)
Richard B
  • 186
  • 2
  • +1 for the excellent article. In the link, there is a link to known issues, one of which is getting heart beat alerts which *says* its still applicable to 2007 R2, but my testing did not bear that out. I'll wait and see if I get any more reports from the wild so I can better narrow down the scenario. – Zach Bonham Oct 06 '12 at 17:43