3

Our users experience slow session performance at times during normal work hours. Applications (IE, Office Apps etc.) are slow to respond so is switching between them. This problem happens sporadically and below is some troubleshooting that took place.

We started gathering performance counters through the day and asked that users report when the slowdowns occur. See below for the graphs that show disk performance. The arrows point to the times when users reported slowdowns, and show that the problem is disk related.

Disk use graphs

Can anyone suggest further troubleshooting in order to track the culprit process/application?

Some server specs [OS: Server 2003 32bit Enterprise with /PAE flag] [RAM: 32GB] [CPU: 2xQuad Core @ 2.27Ghz] [HD: RAID5 1.2GB 3xSAS 10,000RPM HD. Controller has no battery and write cache is disabled]

Using Process Explorer i can take a look at processes and track which do the most disk reads/writes.

Processes with highest DISK WRITES: System, ccSvcHst.exe (Symantec Process), FireFox.exe

Processes with highest DISK READS: winlogon.exe, firefox.exe, explorer.exe

Processes with highest DISK WRITE BYTES: System, firefox.exe, ccSvcHst.exe

Processes with highest DISK READ BYTES: System, winlogon.exe, firefox.exe

MikeM
  • 41
  • 6
  • It's RAID 5 made out of what kind of disks? How many disks are in the array? They look like SATA disks based on your graphs. – Evan Anderson Nov 14 '12 at 18:56
  • 1
    scheduled events? Antivirus scan? – mulaz Nov 14 '12 at 19:05
  • RAID5 Consisting of 3x 10000RPM SAS disks. – MikeM Nov 14 '12 at 19:06
  • How many users? – HopelessN00b Nov 14 '12 at 19:13
  • Number of users fluctuates between 19 to 24 or so. When the number is around 16 it's smooth sailing for the most part. – MikeM Nov 14 '12 at 19:17
  • 1
    I've heard of Firefox causing HD Thrashing on more than a few occasions. Usually the only fix if that's the culprit is un-installing, removing Firefox profile data, reinstalling Firefox. – Chris S Nov 14 '12 at 19:26
  • @ChrisS Write cache disabled, RAID5, two dozen users and writes of temporary internet files? Sounds like a recipe for fail to me. – HopelessN00b Nov 14 '12 at 20:38
  • Recent versions of Firefox have become extremely poor performing and definitely do cause a drive thrashing on many systems, even to the point of bringing fairly high spec workstations to the point of unresponsiveness. I'd also be very suspect of any Symantec application. Consider replacing both. – John Gardeniers Nov 15 '12 at 03:55

2 Answers2

4

Write caching disabled and RAID5? That is a particularly underperforming combination of bad. Windows stands on the user profiles, so the appdata and registry activity alone would surface this issue on such a poor-performing storage subsystem. There could be other aggravating factors, such as the default registry lazy flush interval is too frequent.

The registry lazy flush interval may be increased by adjusting the following DWORD registry value:

Key: HKLM\System\CurrentControlSet\Control\Session Manager\Configuration Manager  
Value: RegistryLazyFlushInterval 

Use 60 (decimal) to specify 60 seconds. I believe the default value is 5 seconds.

The registry in particular is pre-disposed to locking issues. One issue we encountered on Windows Server 2003 manifested after an Internet Explorer security hotfix, and was related to the Browser Helper Object for Java. You can read more about that here:

https://serverfault.com/a/110242/20701

20 users seems a bit low to experience performance issues, however it's difficult to know because that is really based on the applications in use and the user type/behavior. While you may be able to address some of the issues by increasing the lazy flush interval or ruling out the Java BHO, I would start by addressing the problematic disk subsystem.

Greg Askew
  • 34,339
  • 3
  • 52
  • 81
  • Interesting. Looking the Lazy Write Flushes graph it appears that the spikes occur at the same time as disk performance gets heavy and performance degrades http://imgur.com/SXONJ . – MikeM Nov 14 '12 at 19:55
  • Lazy Flush Interval is definitely one of those settings that should be increased on a terminal server. I added some information about that in the answer. – Greg Askew Nov 15 '12 at 00:44
1

I'm going to suggest that your culprit probably is not an application or process, but that you're simply trying to push too much read/write for your card or disks (in that configuration). RAID5 is a parity RAID, which means that for any single write, there's actually a corresponding parity calculation (and thus, an additional write) on each drive in the array, which means that random write performance on RAID 5 arrays tends to be pretty bad.

See our canonical RAID levels thread Q&A here, but in general you only want to use parity RAID when the majority of the disk load is reads, like on a read-only or rarely written to file share, for example. (And for the problems most all of us have run across recovering a broken parity RAID array, you'll find many SAs try to avoid parity RAID in general, when at all possible).

The fact that your OS is on the same RAID5 volume as everything else, and you have multiple client accessing data simultaneously is just a recipe for this kind of problem in my experience, and my solution (assuming 6 drives in your server) would [probably] be to break the array into two - use a 2 drive mirror RAID for the OS, and a 4 drive RAID 10 for the rest. Honestly though, as long as you get out of the RAID5 situation, and switch to a RAID level better suited for your needs (like RAID 10), you'll be in much better shape.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
  • I second the RAID5 being a large part of the problem. I learned this the hard way a long time ago with an extremely heavy usage SQL database server that had horrible performance on a RAID5 array. – DanBig Nov 14 '12 at 19:24