4

The guy that built and setup this system left rather abruptly and I've taken over.

My current issues are

  1. I have several change requests that are stuck at New. They do not move to Pending or In Progress.

  2. The system is not sending emails when incidents are getting assigned to people. This used to work on this system.

I have done a lot of searching and the usual solution to this of stopping and restarting the system center services does not help. Can anyone give me any ideas of where else to look?


Update:

From all the searching I have done it seemed like I was at the point of re-installing. My initial installation of SCSM 2012 was on a machine that was upgraded from SCSM 2010 and also hosted SCCM 2007 and WSUS. We decided to give it a fresh start on a new server by installing a second instance of the SCSM server on a brand new 2008 R2 server then promoting the new server to the workflow master using the procedures outlined in this article - Dealing with Multiple management Servers.

I've gotten to the point where we have both the old and the new server up and the new server has been promoted. I had hopped to get spammed by emails all the sudden due to the workflow taking off, but no such luck. Once all the clients are reconfigured to point to the new server we still plan to decommission the old server but at this point it seems to be that the problem is in the database.

Short of any other input from the community, my next plan is to install a 180 day trial on a test server, complete with a separate database so that I can do a side by side comparison between a completely fresh install and what I have now and see if I can find any differences. While that install is running I also plan on investigating the event logs to see if there is anything in there that can shed some light on what is happening on the new server.


Update 2:

So I've now got a test SCSM server up with a completely fresh install including Database and it seems to be able to transition Change Requests from New to In Progress. I'm attempting to find differences between the two. Stay Tuned!


Update 3:

In looking through the event log on the new SCSM machine i discovered:

Log Name:      Operations Manager
Source:        OpsMgr Root Connector
Date:          10/9/2013 3:48:18 PM
Event ID:      28000
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      scsm02
Description:
The Root connector received an exception from the SDK Service while submitting task   status: 

Cannot set availability on a health service that doesn't exist.

This lead me to Event ID 2800 logged after installing secondary server for System Center 2012 Service Manager SP1. I contacted MS to obtain the hotfix, BIG warning here, turns out the hotfix is not so "hot". In order to apply this hotfix, you have to uninstall then reinstall using the files they supply. :( This is where I am at now ...


Update 4:

Not much luck after the re-install. The errors in the event log have gone away on the new server but the workflows still aren't running and neither the event log nor the workflow status screen seem to indicate why. I've done a comparison of the Activity and the Change Request Event Workflows and I've removed everything from the production system that is not in my fresh test system (which is everything), shut down the services, cleared out the cache folders and restarted the services and still no joy.

At the moment the only thing I can think to do is either a)nuke the entire system including the database and start over, losing all of our data in the process or b)contact MS (which is probably going to cost us a butt load of money and time in the end to only advise us to do the same thing. Maybe more idea's will come after coffee ...


No answers came after coffee. Attempting to contact MS. Managed to get to their first line of defense, gave them our SA number and someone is supposed to call me back. I am trying to log into my incident on their site to update my ticket with the link to this thread but when i click on the link in the email they sent me it goes to a "Sorry, the page you requested is not available" page ... Linux is looking better and better all the time.

Chuck Herrington
  • 517
  • 2
  • 7
  • 17
  • Have you looked in Administration > Workflows > Status to determine which workflows are failing and why? If I had to guess, it would be due to memory pressure. SCSM is a *beast* for memory, putting it on a server with SCCM and other roles is a nightmare. It needs a dedicated infrastructure. – MDMarra Oct 10 '13 at 17:05
  • I am just going through the workflows now. I have found a few that do have items that need attention (it would be nice if you could filter on workflows that need attention) and am attempting to get them to retry. Will let you know. – Chuck Herrington Oct 10 '13 at 17:43
  • You *can* filter workflows based on status if you dig into powershell. http://technet.microsoft.com/library/hh316207.aspx – MDMarra Oct 10 '13 at 17:45
  • I've reviewed all the workflows and with the exception of the "Resolve Child Incidents (Parent Incident resolved)" workflow none of them say that they need attention. This one says it has 13 that need attention yet they all say Succeeded??? – Chuck Herrington Oct 10 '13 at 18:34

2 Answers2

0

Check the group membership of the workflow account. It should be a local administrator on all SCSM servers and it should be in the Administrators user role within SCSM. It should also have a mailbox on your Exchange server to send notifications. Also make sure that this account's password has not expired and that the account is not locked or disabled.

Without these permissions, you'll get all kinds of funky workflow behavior like you are seeing, since the workflow account cannot modify the tickets or trigger alerts.

You can also examine Administration > Workflows > Status to check for failed workflows and get some debugging information. I have a hunch that you'll see a large number of failed workflows.

MDMarra
  • 100,183
  • 32
  • 195
  • 326
  • I've checked the workflow account and it is in the local administrators group as well as in the Administrators role in SCSM. It didn't have a mailbox but it does now. As mentioned previously, I don't have any workflows that have failed. I do have the one that says it needs attention but all the items that need attention say they have Succeeded. The workflow accounts password has not expired and is not locked or disabled. – Chuck Herrington Oct 10 '13 at 20:16
0

So, I know its not the answer I was looking for, but after days of troubleshooting sifting through every resource I could find I gave up and reinstalled the server against a fresh database.

So far everything seems to be ok however, just today we have experienced our first issue of emails not getting sent and having to restart the SCSM services to get them flowing again. I've created a PowerShell script to simplify this (which im thinking of putting into a nightly maintenance job) but as a big fan of Microsoft I have to say I'm concerned about how long until this "solution" needs to be reloaded again.

Chuck Herrington
  • 517
  • 2
  • 7
  • 17