We have a Citrix PS4.0 farm made up of 2 physical and 2 virtual Citrix servers. Any one of them at some point or another will eventually degrade in performance due to hitting 100% CPU usage. I can see the CPU usage spike in the Virtual Infrastructure Client when this happens on either of the VMware servers.

This is not a load issue related to the number of users as it can happen at any time with any number of users.

Users are running shared desktops, not applications. Installed applications in the desktop are standard office application (Word, Excel, Outlook) with limited Internet Explorer access through a Bluecoat Proxy and a couple industry-specific applications.

What tools can be used to troubleshoot and diagnose the source of the problem? Once the server hits 100% CPU, it is impossible to log onto and see what process is consuming all the resources. The only recourse is to hard reset the machine. All servers restart at 4am each morning on a schedule.

NOTE: I already have ThreadMaster installed on all Citrix servers using the default configuration options and logging activities. The logs do not reveal the source of the problem.


  • Citrix Presentation Server 4.0, Enterprise Edition
  • Hotfix PSE400W2K3R03
  • Windows 2003 Server Standard Edition Service Pack 1
  • Runs Symantec Client Security configured per the recommendations from Citrix for file exclusions, etc.
Dave Cheney
  • 18,307
  • 7
  • 48
  • 56
Kevin Kuphal
  • 9,064
  • 1
  • 34
  • 41

12 Answers12


Windows 2003 SP1 went out of support in April, so your OS does not get any security patches anymore. You need to upgrade to SP2 ASAP.

SP2 also has lots of random bug fixes in it - your issue could go away.

If your OS has that old a patch level, there is a good chance some drivers - specifically print drivers - could be out of date on the box too. As drivers are a big source of system instability in general, I would try checking they are all signed and up to date. Having a dodgy print driver would explain why it affects both virtual and physical boxes, and appears to occur randomly regardless of load.

Oh and FYI Citrix 4 goes EOM (End of Maintenance, no more bug fixes) at the end of this month June 09, and EOL (End of Life, no more security patches or any other patches) at the end of Dec 09. Enjoy your upgrade cycle!

  • 3,177
  • 25
  • 29
  • Bear in mind that if you approach Citrix the very first thing they will say is "go install SP2 and come back when it's done and the problem still exists"... We had a random issue with an external DNS server a few months ago. The answer? Install SP2 and a random update fixed it, despite that problem not being listed in the issues SP2 fixed. – Neobyte Jun 11 '09 at 04:30
  • I am in the process of upgrading to SP2 tonight. Should also fix a STOP error we've been encountering that is listed in the KB as fixed in SP2. – Kevin Kuphal Jun 12 '09 at 23:29

You can try scheduling a script to run every minute or so that appends the process list to a file:

pslist >> whatever.txt

Something like this might at least give you a clue as to what's going on.

(pslist comes with the Sysinternals Suite)

Mike Conigliaro
  • 3,105
  • 2
  • 24
  • 24
  • This is fairly brute force. I was hoping for something more elegant. I'd be afraid also of that process filling the drive since sometimes this occurrence doesn't happen for some time. – Kevin Kuphal Jun 05 '09 at 15:09
  • Its still the best way. I regularly see one of our TS stop responding because of run away processes - the program hits a bug and uses 100% cpu. If its only one user I can usually get in and kill it. Sometimes though it happens to several people and a reboot is needed. Its easy enough to have a script rotate the log to stop the disk from filling up. – Steven Jun 07 '09 at 19:31
  • How do you get pslist to generate a list that shows like Task Manager which process is using 100% of the CPU at a given moment? It will show this in "task manager" mode but not from a straight pslist command. – Kevin Kuphal Jun 08 '09 at 19:15
  • Unfortunately, pslist only shows the CPU time. I haven't found a tool yet that will show the CPU % (like in task manager). – Mike Conigliaro Jun 12 '09 at 15:33

The built-in Performance Logs and Alerts tool would be a great tool to get you some data about what's going on. You're going to have to use some disk space to generate these logs, but if you stay on top of deleting old log files until the problem occurs you shouldn't have a problem w/ running out of disk.

I'd start up a counter log on each server computer, logging the Process and Processor objects to disk (I'd probably also grab the Memory object, too).

  • Start / Run / PERFMON

  • Expand the Performance Logs and Alerts node and highlight the Counter Logs node.

  • Click Action and New Log Settings. Name the log however you'd like.

  • Click the Add Objects... button in the log properites window and add the objects to log.

  • Set an interval. I'd probably choose a 60 second or longer interval. High resolution probably isn't necessary since this is a gradual degredation.

  • On the Log Files tab, use the Configure button to choose a location for the log file and a base filename. I'd choose a Maximum log size of, say, 5MB - 10MB. This is going to generate a lot of small files, but you will be able to monitor the path where you're storing the files and delete older files that are piling up prior to the problem occurring.

You can start the log by right-clicking the new log instance in the results pane and choosing "Start". The log will run, by default, until you stop it or until you reboot the computer. (See this question for information about starting a log on boot: How to Setup Perfmon to Automaticaly Start an "Alert" At System Startup? (The question talks about starting an alert, but you can use the same command to start a log.)

You can analyze these logs by hand after the issue occurs. You might want to try Microsoft's Performance Analysis of Logs (PAL) tool (http://www.codeplex.com/PAL). I've been happy with the reports that tool has generated, and it's fairly easy to use.

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
  • Will this show process names? The issue isn't a gradual degradation, but a sudden spike to 100% and then it's too late to see what's going on inside the box. – Kevin Kuphal Jun 08 '09 at 19:13
  • Gradual was the wrong word to use. How about "intermittent". Process names will absolutely be listed. As long as there's enough CPU for the Performance Logs and Alerts service to flush its logs to disk, you'll get info about the process that's going haywire. Fire up a copy of PERFMON on an XP or W2K3 machine, click the "+" in the toolbar, choose "Process" in the "Performance Object" list-box, and have a look at the counters that can be logged. Those counters will be logged for each process (and any new processes) during the log collection period. It's a very, very nice tool. – Evan Anderson Jun 08 '09 at 21:48
  • Since you're restarting the servers each day, you'll need to throw in a scheduled task to restart the performance log on startup. The question I linked above explains how to do that. – Evan Anderson Jun 08 '09 at 21:49

Try to add an extra virtual CPU to the servers IF they only have one vCPU. If it's a singlethreaded application eating up all the CPU you'll atleast get in to kill it instead of reseting the server.

  • 19,532
  • 4
  • 55
  • 75

What edition are you running and do you have an SA agreement?

Are you running antivirus on the server?

Also, what hotfix(s)/rollup are you running for PS4 and what SP are you on for Windows?

Ben Kohn
  • 136
  • 3
  • Editing the original question to include these details. We do have an SA agreement and are currently in the process of building a new farm with XenApp 5 but this continues to be a nagging issue for our current farm. – Kevin Kuphal Jun 05 '09 at 20:15
  • 1
    If you have the ability, I would strongly recommend testing SP2 of W2k3, a lot of improvements to TS and general OS stability were included in that release. There are also some post SP2 hotfixes directly related to TS+AV are available. What does the console look like when this happens? Can you see the SAS screen? Also, if you're current on your SA and have the Enterprise edition, you might consider setting up EdgeSight, it's not that difficult and you'd get all the data you'd need to troubleshoot this further. And then some. – Ben Kohn Jun 05 '09 at 23:10
  • The console is unresponsive...well, I can enter my password to log on but never get to the desktop. Thanks for the EdgeSight suggestion. I'll take a look at that. – Kevin Kuphal Jun 06 '09 at 03:32

how many cpu/core per machine ? hitting 100% on many core would mean a multithread application eating all ressources.

Do you have a pattern (peak every X hours or everyday around 2'o clock) ?

Anything in eventlog (like huge printing) ?

Do you have SCOM ?

Mathieu Chateau
  • 3,175
  • 15
  • 10

We had a similar problem with our Internet monitoring software, and it turned out that the XTE (session reliability) process had corrupted the WinSock library and/or the TCP/IP stack. To repair the TCP/IP stack, run the command "netsh winsock reset" on the Citrix server and reboot.

  • 255
  • 4
  • 6

You are also 3 Rollups behind on PS4. May want to upgrade your servers to Rollup 6

  • 11,393
  • 1
  • 28
  • 53

Have you considered upgrading to WS2003 Enterprise Edition and taking advantage of Windows System Resource Manager to contain application resources?

  • 1,460
  • 10
  • 11

About the only problems we've had on our Citrix boxes hitting high CPU is due to bad printer drivers causing the spooler service to go absolutely nuts. Specifically, it was down to HP LaserJet printer drivers, which were notoriously bad until around December last year when they redid the underlying DLL's which fixed a whole bunch of crashes. The change log on their release notes made for interesting reading.

Anyhoo, you could perhaps try a 'sc \servername stop spooler' from your workstation and see if that can connect and kill the print spooler on the errant server, might help rule out printer drivers being the issue.

  • 990
  • 4
  • 6

ProcessExplorer (free) is a useful tool for digging down deeper into processes running, esp. those running under svchost.exe which are normally hidden. We had a case where a HP printer driver (a perennial problem) was running at 100% on one core. ProcessExplorer allowed us to a) find the command line that was used to launch it (which revealed it was HP) and b) kill just that task. Recommended...

As an aside, AppSense performance manager works really well at handling peaks in CPU load on XenApp. It would recommend it except it's too expensive IMHO. Each time we reach capacity of our servers we go "AppSense or another server?". We've always gone with the later as at £1,000 it's just over-priced for what it does. Even more the case now that we're running free XenServer and can clone an existing XenApp server in an hour.

One of our clients uses BigBrother which is a remote monitoring/health status for servers. Had a quick play myself with the trial but left it as it's also in the big-corp arena.

Rob Nicholson
  • 1,678
  • 8
  • 27
  • 53

Can you keep a session open and running Task Manager yourself? Sort Task Manager by CPU usage and when you see 100%, look for the piggish process.