6

One of our development boxes has developed a problem wherein performance will occasionally drop through the floor. When this happens, you can hear the hard drive thrashing, but I don't know what's causing it.

This happens during periods of high disk access (reading/writing multi-gigabyte files), but not every time nor for the entire period of disk access. Those files are also kept rigorously defragmented specifically to prevent the kind of "seek thrashing" that seems to be occurring.

I suspect that the problem lies either with the system's antivirus or with some disk-indexing service I don't know about (AFAIK, there aren't any running, but…). Unfortunately, my Performance Monitor-fu is very, very weak (okay, nearly non-existent), and I don't know how to confirm/disprove my suspicions or find out what the real culprit is.

Update:

Process Explorer located the culprits for me — the Java Quick Starter and Windows Search services. Turning off the former had a noticeable impact on performance and turning off the latter had an enormous one (despite having not been given any files to access). Both were performing 5-20 times as much disk access as any other process.

Thanks all for your help!

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Ben Blank
  • 195
  • 1
  • 3
  • 7

6 Answers6

10

Download Process Explorer (sysinternals/microsoft).

Under View, Select Columns, Process Performance, choose e.g. IO Write Bytes, IO Read Bytes.

You can click on those columns to sort.

swordfishBob
  • 136
  • 2
  • Based on this, the lead suspects at this point are the Windows Search service (was doing lots of reads and writes despite having no files to index) and Java Quick Start. I've disabled both and will put the system through its paces to see if that helped. :-) – Ben Blank May 30 '09 at 19:53
  • 1
    http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx – Ryan Jul 09 '09 at 14:07
3

SysInternals has a couple of tools that might help here. Their PerfMon tool will give you (exhaustive) details about what's accessing what. If it really is an AV tool doing a scan, it should show up there impressively obviously. If it is a background Windows task, things get a bit more complicated, but those accesses do show up in there as well. Even if it is less obvious which service is responsible.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
1

To verify a disk I/O issue, you will want to start by monitoring the Logical Disk/Current Disk Queue Length on each drive in Performance Monitor. This should generally stay at zero unless you have multiple processes accessing the disk. Excessive I/O or poor disk performance will increase the queue length.

Once you verify an I/O issue, use something like FileMon or Process Monitor for Microsoft (Sysinternals) to see what process is causing the activity.

If you do not see a process corresponding to the I/O then it may be a page file/memory issue. Go back to Performance Monitor and add the Memory/Pages per second. This will show how frequently a process has to go to disk to page memory in to RAM. If this is holding above zero, then you need more memory or an application has a memory leak. To find the memory leak, use the Process/Pages per second to see which process is forcing the pages.

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Doug Luxem
  • 9,592
  • 7
  • 49
  • 80
1

I have use Sysinternals FileMon successfully to find out what program is doing much I/O and with which files. For example when I switched to an SSD that has poor random write performance (4 IOPS, OCZ Core v1), FileMon told me which programs were doing writing, and I could move those files from the SSD to another HDD. It also helped me to find out that updating of last access timestamps was killing the performance (when Locate32 indexes all my disks once a day), so I was able to disable last access timestamps.

Esko Luontola
  • 1,213
  • 1
  • 11
  • 9
0

You can configure Perfmon to trigger commands when certain thresholds are met, so as DLux suggested monitor the Current Disk Queue Length and if it hits 3 or 4 set it to trigger a batch file. The batch file could trigger a utility like filemon which can log all the files accessed by the system and that should give a better of idea of what the system is accessing.

David Yu
  • 1,032
  • 7
  • 14
0

A possible culprit could be excessive page file usage. You can use PerfMon to track Page Faults per Second. If these go up when your performance goes down, there's your answer.

Graham Powell
  • 410
  • 2
  • 8