17

Note: I've read How Often Do Windows Servers Need to be Restarted? but this question pertains to our Remote Desktop server specifically.

We have a Windows Server 2008R2 server - a VMware ESX VM - licensed for Remote Desktop Services, 25 users that also does RRAS (SSTP). On an average weekday, during working hours, there are between 8 and 12 logged-in, active users with an additional 4-6 "disconnected" users. It has a 12 GHz CPU hard reservation and 16 GB RAM, also entirely reserved. The CPU reservation is expandable to 24 GHz max when needed.

Many of our users rely exclusively on the server to work. They also complain bitterly about its performance but many are unwilling to change working habits or software to improve its performance. Specifically:

  • Users refuse to log off instead of disconnect
  • Users insist on using Lync 2013 instead of Lync 2010 (Lync 2013 is a notorious resource hog)

I cannot overstate the significance of their refusal to log off. Disconencted users continue to hog RAM while disconnected, which means that at any given time, we have up to 16 instances of certain programs running.

I've also noticed through experience that leaks/zombies tend to add up the longer a Remote Desktop server has been running. After a reboot the server is fresh and much faster, even when comparing performance after many users have logged in. I've also read that regular reboots can be helpful.

So I have proposed regular reboots of the VM - I would like to do it weekly, say on Saturday evening - as I feel these reboots would solve a lot of the problem.

I would like to know, if you are a Windows admin,

  • Am I right about the fact that garbage/zombies/leaks accumulate with session time, even after a user disconnects/reconnects?

  • How often do you restart a similarly-utilized Windows Server with Remote Desktop Services?

tacos_tacos_tacos
  • 3,220
  • 16
  • 58
  • 97
  • 10
    Why not use a policy to force logoff for idle sessions? – Massimo May 12 '15 at 22:06
  • @Massimo because they would consider this too heavy-handed... they lose work every time I reboot without sufficient notice, ie to reboot at all they need to know by "noon" of that day, and even then it is only after some grumbling and discussion, etc. – tacos_tacos_tacos May 12 '15 at 22:22
  • 12
    You are going to need to adjust the expectations of your users. IMO it is unreasonable of them to expect they can safely leave an idle session with unsaved data for any length of time. An unexpected crash, equipment failure, power outage or some other act of Chaos could just as easily destroy their unsaved work. – Zoredache May 12 '15 at 22:38
  • 2
    I don't mind the question, but the answers as the question is phrased are going to be option based. Try to rephrase for more fact or (or at least performance based) answers. – Jim B May 13 '15 at 04:37
  • It is almost always necessary to restart a RD server each month, in order to install security updates. In my experience, that's often enough to keep things running cleanly, though my server isn't as heavily loaded as yours. (Have you considered increasing the number of virtual CPUs?) – Harry Johnston May 13 '15 at 07:21
  • I guess another option would be "use something else instead of Lync"? – user253751 May 13 '15 at 09:59
  • The only effect that rebooting has is to kill user sessions and programs. The OS does not clog up over time. There is no need to restart the OS but it can be a convenient way to kick out all users. That's really all that it accomplishes. You can get the same thing done by killing idle sessions. – usr May 13 '15 at 12:01
  • @usr do you have any evidence for this? – tacos_tacos_tacos May 13 '15 at 12:04
  • 1
    @tacos_tacos_tacos it's my experience. What, exactly, is supposed to clog up about a running OS? It is a vague notion that is unfounded. The OS does not do that much. The user processes do stuff. When they are gone the slate is clean again. The OS usually gets out of the way and does what user processes ask. It does not initiate resource usage by itself. – usr May 13 '15 at 12:36
  • @usr, that's definitely the goal, but not the reality in my experience. Bad software can leave the system objectively "worse-off" from a memory management point of view until restart. – tacos_tacos_tacos May 13 '15 at 13:07
  • @tacos_tacos_tacos I don't think that is the case. Misbehaving processes can eat up memory but this is all released when the process is killed. Rebooting regularly isn't a requirement other than for updates. Are the users actually leaving long running tasks processing or just a case of laziness? – JamesRyan May 13 '15 at 15:51

7 Answers7

23

Generally, I'm opposed to the idea that a Windows server should be rebooted on a regular schedule EXCEPT in relation to TS/RDS servers. We reboot ours every day. It clears up old sessions, releases in use resources (CPU, RAM, file handles, etc.), so my opinion and suggestion would be that you do configure a daily scheduled reboot of your RDS servers.

Note that this answer is only my opinion. There's no statement of fact here.

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
  • Where I worked we also rebooted ours every night. Some times the server doesn't come back up, but it happens so rarely that it was worth it. – Frederik May 13 '15 at 07:03
  • How often did you reinstall it? – Konrad Gajewski May 13 '15 at 10:21
  • 4
    +1 Citrix, Microsoft and myself all recommend regular reboots for TS servers. These are essentially End User computing boxes and will normally be running applications that aren't optimised for servers - this means memory leaks, not releasing resources and so on. Weekly at an absolute minimum, but daily wherever you can - it WILL make your life easier. – Dan May 13 '15 at 13:26
  • @Dan any links to the Microsoft recommendation you mention (regular reboot)? – tacos_tacos_tacos Jun 11 '15 at 18:39
17

Users refuse to log off instead of disconnect

Setup the appropriate group policies to auto-logoff them. You can separately control an idle timeout and logoff. That should certainly minimize some of the issue during the day.

I restart my 3 server TS farm daily at 3:00am. Because, yes crap can build up over time when you have lots of people using a single system. We have 3 servers shared between 60-90 people depending on the day, time of year.

I probably don't need to reboot this frequently, but we started using terminal services with Windows 2000, and our printer drivers were horrible at the time. The print spooler would basically fail after a day or two of being up. So we started rebooting nightly since we didn't have any leverage to get the Printer manufactures to fix their crappy drivers.

Zoredache
  • 128,755
  • 40
  • 271
  • 413
  • regarding the printer drivers, etc: I did read either here or somewhere else reputable that MS made great strides in this department - and in reducing need for reboot in general - between Windows 2000 Server and Windows Server 20032R2 SP3. So I'm not sure that the drivers issue has relevance. Actually I've noticed newer versions of Windows (Server) seem to handle print drivers and spooling surprisingly well. – tacos_tacos_tacos May 13 '15 at 12:09
  • I actually don't reboot my TS server very often, but every night I stop the print spooler, delete any print jobs, and restart it. This also cures incidents when users are unable to log in using RDP. (Windows Server 2003) – Randy Orrison May 13 '15 at 20:41
6

Depending on your cash, time, and the savviness of your users, another idea could be to stand up a second server. You'll still need to reboot occasionally, but you seem to be reaching the limits of a single server.

You should be able to use the same client CAL's (licensing's not my strongest area), and depending on your virtualization solution an additional VM may already be covered by existing licensing.

Even without additional VM resources and with the extra OS overhead, you may find the system handles better as two separate 6 GHZ CPU and 8GiB memory VM's, assuming you can split the load evenly. There are three potential methods:

  1. The cleanest way is to use a proper network-based load balancing solution such as those provided by F5 Networks, Cisco Systems and similar companies. If you've already purchased a solution like this, it would be worthwhile to use it here. You can then ignore the rest of the answer as the f5 will then be able to appropriately parse all the queries for the FQDN used to access your current RD server and return an appropriate IP based on the least-utilised of your servers.
  2. Round-Robin DNS is a passable solution. It won't guarantee a perfectly even load, but it could be a useful stopgap while you educate your users (see 3) if you can't use a network load balancer. Replace the current DNS name clients are using with two host records that have the same name but different IP's (your two servers), ideally also configure separate host records (preferably based on server hostname) that is linked to each individual server.

Set a long TTL on your round-robin entries if you don't want clients leaving disconnected sessions on one server once their DNS cache expires and they acquire the IP of the other server. Alternatively make the hostname of the computer they've connected to obvious (e.g. make it part of the background), and ask them to re-connect to that hostname if they want to resurrect their session.

  1. Have your clients distribute the load. With ~25 users, it may be possible to simply ask (via email or a login message on the server) certain users to hit one server, and the rest to hit the other. Alternatively if you control their desktop platform or they access the server via citrix or another application virtualization appliance, simply configure their hosts file† so that they always hit the same server (desktop) / ensure the same user is always sent to the same server (appliance).

† If they will always be using the same desktop, simply modify the hosts file on the local desktop. If they move between machines, write a script (distributed via group policy) to parse the host file such that the DNS entry they currently use for the server points to the IP of the server that particular user should be using. Replace the line containing that DNS name if it already exists, or add it to the end of the file if it does not.

Bruno
  • 281
  • 1
  • 10
4

I am familiar with the "user type" that refuses to logoff. However, they seemed to have no issue understanding that the Server would be rebooting nightly so any unsaved work would be lost. This is on Server 2008 R2 TS Supporting About 20 users on a single machine.

user288719
  • 151
  • 3
1

> Users refuse to log off instead of disconnect

You have a management/HR issue here rather than a technical one. If people staying logged on are affecting other people's work (by reducing performance unnecessarily) then there are only really two solutions:

  1. Make it a technical issue and arrange for an increase in resources (more RAM, SSD in place of spinning metal, ...) if possible so that the issue goes away that way. Of course there are limits to what you can achieve by throwing new resources at a single machine but it might work.

  2. Persue it as a people management problem and find some way of encouraging (or failing that enforcing) appropriate discipline. Of course this may be outside your direct responsibility so it could be quite tricky depending on your office's politics...

We had a similar problem with people never restarting their desktop machines meaning that security updates were sometimes queued for months. Security policy stated that "patches for know security issues should be installed in a timely manner, immediately in cases where exploits already exist in the wild, unless sufficient mitigations can be proven" so in the end it was simply enforced by group policy: all non-server Windows machines will reboot overnight on a Tuesday if there are pending updates, no exceptions. If anyone argues against this there are two easy counters: if we don't follow that policy we'd loose our ISO-this-that-and-the-other accreditation next time there is any audit which is important to the business, and our contracts with our clients make statements about security policy too (as we sometimes handle their data we have to assure them that their data is safe with us) so without that enforcement we are in breach of some very expensive contracts.

> Users insist on using Lync 2013 instead of Lync 2010 (Lync 2013 is a notorious resource hog)

Is there a specific reason why, other than they want newer shinier things? If there is a feature they genuinely need then there may be little you can do about this angle.

If a chat application is the main resource problem, I wonder if there is a way to kill just instances of that program in the idle sessions instead of killing the whole sessions?

> they lose work every time I reboot without sufficient notice, ie to reboot at all they need to know by "noon" of that day

You don't state the nature of the work so this is very dependent on what that is, but they may be failing at due diligence (i.e. not doing their job properly).

If they are not saving documents regularly then they are putting their work at risk, not you. What would happen if there was a power out or other fault that took the server down? Would they blame you also?

Of course if they are actively working at the time of the reboot or are needing to leave long running processes going unattended then there might be a genuine scheduling issue that you need to work out between you.

David Spillett
  • 22,534
  • 42
  • 66
0

With the risk of sounding like a sales person - we use ShutdownPlus Rolling Restart . We've got it set up to try and restart our servers every night. It works pretty good - you can set it up to only restart servers after everybody has been logged off. It'll restart the loop if someone is still using the RD server a X number of times. The tooling can also log off users for you, if you'd like. Or even powercycle your VMs @ ESXi.

I'm using it with a couple of GPO which logs off disconnected users after a couple of hours. And disconnects active sessions after a certain idle time of course. It's a pretty graceful method, aside from the occcasional rogue program which keeps sessions from closing. We've worked around those though. The way we've got it setup now every server tries to reboot every hour from 22.00 to 7.00, untill it succeeds of course. Effectively, users reboot at least 2/3 times a week, which is fine by me.

Unfortunately this isn't a free program, but it does the job pretty good. I'm implementing a powershell script which'll hopefully update the servers before rebooting as well.

Robert
  • 1
  • 1
0

Straight answer to Microsoft server reboots YES/NO. Oh if life were that easy! It does depend on the applications running on the server. But here is a simple guide but NOT a hard and fast rule.

Physical Server Running Windows server **x Version** (Auto Reboot & Schedule) 95% can be rebooted once every fortnight without any real concerns. (Check the patch being applied is relevant and required). Ensure you fully test the patch on your test server(s) before releasing to the live/production systems.

VMWare Virtual Servers running Windows Server x Version - Reboot once a fortnight (See above comment if patches are applied)

Physical VMWare Server NEVER/Rarely and only if required never scheduled. (Normally very Stable if kept up to date) VMWare patches/updates will require a reboot.

VMWare running Windows SQL (Limit reboots, Apply Windows patches MANUALLY ONLY! restart IF patch requires it and then only after you have stopped ALL clients connections) Check connections have reconnected once server is back up. SQL Servers can take quite a while to reboot, so plan this out of hours.

Reminder: Before making ANY changed to a VMWare (Windows Server) SNAPSHOT it! if the system crashes after Service Patch or updates applied or applications fail to start you can quickly get the server backup and running with limited down time. Remember to make notes of errors so you can find the fix do not leave the system alone because it failed as it may fail in the future.

Hope that helps and goes a small way to clear things up.