12

What WQL queries would you use for monitoring typical Windows bottlenecks? Which would you use to obtain data similar to 'top' or 'netstat'? What interval would you poll at?

Here are a few that I find helpful.

SELECT PercentDiskTime, AvgDiskQueueLength, DiskReadBytesPerSec, DiskWriteBytesPerSec FROM Win32_PerfFormattedData_PerfDisk_PhysicalDisk

SELECT Caption, CommittedBytes, AvailableBytes, PercentCommittedBytesInUse, PagesPerSec, PageFaultsPerSec FROM Win32_PerfFormattedData_PerfOS_Memory

SELECT PercentProcessorTime FROM Win32_PerfFormattedData_PerfOS_Processor

SELECT Caption, WorkingSet, PageFaultsPerSec,IOReadBytesPerSec, IOWriteBytesPerSec, ThreadCount, HandleCount FROM Win32_PerfFormattedData_PerfProc_Process

SELECT Caption, BytesReceivedPerSec, BytesSentPerSec FROM Win32_PerfFormattedData_Tcpip_NetworkInterface
Yancy
  • 89
  • 1
  • 6
  • Excellent stuff, this is useful to application programmers as well. Much of this stuff is not available directly through any Win32 API call; WMI's useful but not discussed as much as it ought to be! – Andon M. Coleman Sep 25 '15 at 01:17

1 Answers1

7

This is a truly great question, and it's a shame it has not gotten more love!

My basic theory of bottleneck analysis is to treat the system as a box with 4 sorts of finite resources: processor, memory, disk, and network. So I want to get basic numbers for each of these to determine the health of the box. I want numbers that are easy to interpret: high is bad, low is good. 0 is best, though never perfectly achievable (after all we bought the computer to do work, eh?). Once I see which of the four resources is the main bottleneck I can proceed to determining which program or process is eating all the resources, and make an educated decision as to whether I need to increase that resource - or tune the program/process to use less of the resource.

I will format the main performance counters I use, from this article, as WMIC queries, because no scripting is required (although it is certainly possible!). You can enter each of these queries directly into the cmd console:

wmic path Win32_PerfFormattedData_PerfOS_System get ProcessorQueueLength

Above is Processor Queue Length. This tells how many threads are waiting in queue to be handled by the CPU. High numbers bad, low numbers good. Generally I consider a value <10 to be a healthy system.

wmic path Win32_PerfFormattedData_PerfOS_Memory get PagesInputPerSec

Above is Memory, Pages Input per Second, the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in physical memory, and must be retrieved from disk. This counter works best in Perfmon's graph view, though. On a healthy (not bottlenecked) computer, you'll see occasional spikes as data is read from disk into RAM the more spikes you see, and the higher they go, the more memory constrained the system is. If the system often stays at a nonzero value for periods longer than, say, five seconds, you probably have a memory bottlenecked system.

wmic path Win32_PerfFormattedData_PerfDisk_PhysicalDisk get AvgDiskQueueLength, name

Above is PhysicalDisk, Average Disk Queue Length. I consider this to be the key indicator of system health, since memory bottlenecks will also bog down the disk due to excessive pagefile swapping - and will often push up CPU utilization as well. It will show an item for each mounted disk as well as a total of all disks. A well performing single disk will have this value at 2 or lower. For arrays, divide the number of spindles by the queue length (eg: 4 spindles in array divided by a queue length of 8 = 2, which means the array is performing well).

wmic path Win32_PerfFormattedData_Tcpip_NetworkInterface get OutputQueueLength, PacketsReceivedErrors, Name, currentbandwidth

And finally, above we have NIC performance. Specifically Network Interface, Output Queue Length and Packets Received Errors. These two counters let us know how many packets are waiting to be sent, and how many inbound packets caused errors which probably resulted in retransmits. We want both numbers to stay at zero. In this query I also get the current bandwidth of the NIC which is useful information.

Once I've determined which resource is overused, I usually depend on either Process Explorer or Perfmon's process object to discover which process is the resource hog.

quux
  • 5,358
  • 1
  • 23
  • 36
  • Thanks for detailed write-up. I've converted to a community wiki. I think another facet of this question is polling intervals. Some bottlenecks will only appear briefly, others may be sampled with less frequency. – Yancy Aug 18 '09 at 18:32
  • Well, most often one is looking for bottlenecks *reactively* (because some issue has been observed) rather than *proactively* (just being on lookout in case of a bottleneck). In either case, though, perfmon graphs over even a few minutes are far more useful than point-in-time snapshots. – quux Aug 19 '09 at 21:27