9

We have approximately 200 servers, Hyper V, File Cluster, and IIS, that are all experiencing the same issue, an event occurs on the server through normal use that maxes out or near maxes out the RAM on the server. Once this happens, the SVCHOST/Workstation service, specifically (weeded out by isolating the Workstation service to it's own SVCHOST) stops releasing handles/threads and the memory used by that service is never released. We have, in some extreme cases, Workstation services that are using as much as 40GB of ram on a 255GB server. Also finding upwards of 40 million handles in some cases.

On reboot, the problem of course, goes away, and doesn't appear again until all the memory has been used, say by the W3 process or the HyperV VMs, after that, the Workstation service starts grabbing all the RAM. The process is very slow and can take weeks/months depending on the amount of RAM on a server.

Both our Hyper V servers and IIS servers access shares for working files, these shares are on SSD storage, so they are plenty performant. We've installed all the current patches but have not moved to R2 as we have a lot of tooling in place that will make this a significant step and cannot find any clear indication that this would be fixed in R2.

We have run ProcMon and other tools but on the most problematic servers those tools won't even run. On the others, the results they provide just show that there appears to indeed be a memory leak in that process.

Is there a way we can free up the memory from this process or avoid the bug all together? We don't want to have to reboot and we cannot restart the process once it's in an error state. The process becomes frozen.

We're trying to avoid doing regular reboots to 'fix' this issue, so any answers would be appreciated.

Craig
  • 585
  • 2
  • 12
  • What is your question? – Andrew Schulman Nov 24 '14 at 17:51
  • Indeed we do, but it's ambiguous at best, just thousands/millions of threads opening. On the most problematic systems we can't even run those tools, they just crash the server. – Craig Nov 24 '14 at 18:22
  • We want to figure out a good solution to solve the problem other than rebooting the box. We are unable to stop the services once this problem starts. – Craig Nov 24 '14 at 18:23
  • Do you have the option of getting Microsoft support involved? Never my first resort, or something I even particularly enjoy, but it does sound like in this case, even if you find the problem, it's likely to be a bug in Windows that you won't be able to correct anyway. I'd hate to pull my hair out for months chasing down memory leaks only to find out that the only solution is a Microsoft hotfix. – HopelessN00b Nov 24 '14 at 23:20
  • Has KB 2811660 been installed? Are these systems running server manager? http://support.microsoft.com/kb/2793908 –  Nov 24 '14 at 21:15
  • Yes, this KB was installed some time ago. Also, this leak is specific to the Workstation service, that KB applies to the WMI service. – Craig Nov 24 '14 at 21:22
  • @HopelessN00b Yes, Microsoft has not been able to provide a solution, they are even looking at months of us working, trying to get us to identify their problem for them. We are looking at any way to address this issue at this point. – Craig Nov 25 '14 at 15:01
  • You say that workstation is opening handles. Can you use procmon/rammap(http://technet.microsoft.com/en-us/sysinternals/ff700229.aspx) to isolate what it is getting hung up on? We have a similar sized windows environment with similar infrastructure and haven't seen this issue occur. It's possible that something specific to your environment is causing this (Antivirus/Network management etc). I'd definitely try to limit any 3rd party factors if at all possible. – NPS Nov 25 '14 at 23:00
  • We've run those tools, just shows that the workstation service is opening threads and not closing them there are no third party apps involved. RAMMAP just shows SVCHOST holding on to several gig of RAM. – Craig Nov 25 '14 at 23:09
  • try to capture a xperf trace of the memory usage GROW (1-2 minutes): http://pastebin.com/peqLGxSa – magicandre1981 Nov 27 '14 at 16:43
  • This bug remind me a bug I seen on 2008R2 with the NTFS metafile, that use all RAM over time. It existed that on 2008R2; Microsoft Windows Dynamic Cache Service (http://www.microsoft.com/en-ca/download/details.aspx?id=9258) Can be worth to check if it's built-in 2012. – yagmoth555 Dec 02 '14 at 14:52
  • I will add, if it's the metafile, check in RAMMAP under "Use Count", "Metafile" would be listed there if it's the problem. – yagmoth555 Dec 02 '14 at 15:19
  • have you captured the trace? It would like to see what is using the memory. – magicandre1981 Dec 02 '14 at 17:18

3 Answers3

1

I had an eerily similar issue where the svchost was destroying the server performance.

The solution: Turns out I had a full Event Log. I cleared it out and everything was back up and running like nothing ever happened.

(I also recommend changing the size of the event log from the default, see below)

To set maximum log size by using the Windows interface
- Start Event Viewer.
- In the console tree, navigate to and select the event log you want to manage.
- On the Action menu, click Properties .
- In Maximum log size (KB) , use the spinner control to set the value you want and click OK .

It sounds exactly like what is happening here, but ended up being a really easy fix. A restart would temporarily solve the issue, but as soon as anything tried writing to the log, everything would spiral out of hand and just kept eating up resources.

Hope this helps!

Aelof
  • 11
  • 2
-1
>Is there a way we can free up the memory from this process ?

There is no way you can externally (properly) release allocated memory or handle resources w/o killing the offending app.

>or avoid the bug all together? 

You are experiencing a memory and resource leak. The only way you will solve the problem is finding the leak and either avoiding its trigger (if possible) or fixing the leak at source code level; In the last case you need Microsoft help for producing the patch, but it seems they expect you to tell them "exactly" where the problem really is.

You can try to find the culprit by pinpointing the memory/resource leak by using i.e. MS Application Verifier

Pat
  • 3,339
  • 2
  • 16
  • 17
  • The trigger is file shares, which we cannot avoid. – Craig Dec 01 '14 at 17:23
  • if you cannot avoid the trigger then find the leak with "Application Verifier" and contact MS with that info. – Pat Dec 01 '14 at 22:44
  • The applications, as there are multiple, are all Microsoft. We've already contacted them, we're looking for a quicker solution as they are stating it may take them weeks/months to sort this out. – Craig Dec 01 '14 at 22:49
  • Considering MS will not really rush for solving this kind of thing on a non-current OS I do not think you will find a quicker solution. A different thing is if you tell them where the leak is located. – Pat Dec 01 '14 at 22:58
  • We have an open case and have been working with them for a month. The leak is literally in the Workstation service. – Craig Dec 01 '14 at 23:14
  • well I do not understand what prevents you trying to find the leak and speed up the process; yes I know using "Application Verifier" requires some knowledge but that how you deal with this kind of problems. – Pat Dec 02 '14 at 08:20
  • Regardless, the culprit is needed/required and a Microsoft product. – Craig Dec 03 '14 at 22:24
-1

Crearing RAM is easy but no solution.

I suggest Sysinternals RAMMAP or VMMAP for deeper investigation. With this tools you can better see what happens. very often its a metafile problem.

Since Server 2008 we have this issue with all terminal servers running out of memory with an unbelievable memory consumption over time when starting applications from share.

Our workaround is hosting that applications on a separate Terminal Server and frequently clearing memory consumption.

We do this with a self designed c++ command line application using
SetProcessWorkingSetSize() with SeDebugPrivilege on all processes

Its strongly recommended not to do something like this ;)

Magnus
  • 29
  • 1
  • 7