We have approximately 200 servers, Hyper V, File Cluster, and IIS, that are all experiencing the same issue, an event occurs on the server through normal use that maxes out or near maxes out the RAM on the server. Once this happens, the SVCHOST/Workstation service, specifically (weeded out by isolating the Workstation service to it's own SVCHOST) stops releasing handles/threads and the memory used by that service is never released. We have, in some extreme cases, Workstation services that are using as much as 40GB of ram on a 255GB server. Also finding upwards of 40 million handles in some cases.
On reboot, the problem of course, goes away, and doesn't appear again until all the memory has been used, say by the W3 process or the HyperV VMs, after that, the Workstation service starts grabbing all the RAM. The process is very slow and can take weeks/months depending on the amount of RAM on a server.
Both our Hyper V servers and IIS servers access shares for working files, these shares are on SSD storage, so they are plenty performant. We've installed all the current patches but have not moved to R2 as we have a lot of tooling in place that will make this a significant step and cannot find any clear indication that this would be fixed in R2.
We have run ProcMon and other tools but on the most problematic servers those tools won't even run. On the others, the results they provide just show that there appears to indeed be a memory leak in that process.
Is there a way we can free up the memory from this process or avoid the bug all together? We don't want to have to reboot and we cannot restart the process once it's in an error state. The process becomes frozen.
We're trying to avoid doing regular reboots to 'fix' this issue, so any answers would be appreciated.