My company is developing a web-based data viewer application which requires a fairly decent amount of bandwidth to function well. However recently we have been changing a lot of things. For example, we changed our internal network infrastructure so that data can be hosted on separate machines connected by Gigabit Ethernet. On top of that, the application itself keeps coming out with new versions since we are still in alpha and beta testing.

Recently we made some changes that are causing poorer performance, and we want to try to identify where the problem is before we start tearing things apart. It is a very small network, and I have limited experience as an IT admin. I have a few ideas for where to start, but I would like to harvest a little wisdom from the pros first: How do you tackle/avoid similar problems? What are the most useful (Windows) tools you have used?

  • 1,003
  • 2
  • 11
  • 16

7 Answers7


I always follow this approach: Try to test one thing at a time.

The trusty "Scientific method" works really well for troubleshooting:

  1. Come up with a theory for why the app is slow
  2. Devise a test that may confirm that theory.
  3. repeat.

For a webapp this might mean:

  • could it be the databse? Run some standalone SQL queries
  • could it be the web server? Test the web server by fetching static pages
  • could it be the app? Test the web server by hitting dynamic pages that don't hit a database
  • could it be the apps interface to the db? Test the web server by hitting dynamic pages that do hit a database.

also running basic benchmarks for testing cpu,memory,disk speed can help rule one of those things out before you go any further.

I see things like this all the time:

back ups take longer on the new server than they did on the old one.

But no one did a basic disk benchmark to find out that the older server had twice as many spindles than the new server does... or a network benchmark to find out that the new servers gigabit ethernet was only running at 100M.

all that said, if this is a custom web application, the framework you are using most definitely has a way to dump performance information to a log file.. but that is more of a question for stackoverflow.

  • 3,776
  • 15
  • 20

I have subscribed to the "Sherlock Holmes" method of troubleshooting, aka Binary Search Troubleshooting Method:

  1. Divide the problem space in half.
  2. Rule out one half of the problem space.
  3. Repeat with remaining problem space.

In my experience, you sometimes get lucky by trying some obvious things first, but once you exhaust the truly quick fixes, you need to get methodical quickly.

This method is compatible with Scientific Method and Test One Thing At A Time.

  • 1,650
  • 1
  • 14
  • 22

The sum of the answers above are 90% of what I would say, here's the other 10%:

  • There's a lesson to be learned about controlling the environment, more specifically changes to the environment. Even if you are already measuring performance, changing more than one thing at a time means any problem turns into a two step problem: finding both the effect and the cause. If you change one thing at a time and have a valid plan on how to rollback any performance issue can usually be associated with that change (usually, sometimes oddball stuff happens or someone changes something you don't know about) and hopefully fixed by rolling back the change.
  • The most beneficial thing to do is to measure early and often. Facts and accurate data make solving performance problems easier.
  • The least beneficial thing you can do is guess what's wrong and change it without measuring. You'd be surprised how often a reasonable sounding guess doesn't solve the problem or makes it worse.
  • You can't measure something you haven't defined yet. Any time you have a performance problem, define what the end user expectation is and then find a way to measure the success or failure to meet that expectation in a way you can repeat. Do this in as specific a way as you can and you'll narrow down the scope of what you have to investigate and the tests you'll need to run to do so.
  • For Windows, I'm a big fan of performance counter logs and using PAL to process and help interpret them. The system overview report and suggested counters for that report cover most of the probable sources of a bottleneck. http://pal.codeplex.com
  • 161
  • 4

Some of the best tools to be found for Windows troubleshooting are from Microsoft's Sysinternals. And some of the best info on how to use them (and Windows technical info in general) can be found on Mark Russinovich's blog and webcasts. His book on Windows Internals is also full of good information.

With the above, I would suggest starting with the programs Process Explorer and Process Monitor to take a look at whatever web service you have running, and seeing what's going on. Both programs allow you to display a large amount of info about running processes, which can be configured by right-clicking the column headings.

Joe Internet
  • 1,439
  • 8
  • 6

What was changed that introduced the performance problem? If only the code was changed, then I'd start my troubleshooting there.

  • 108,377
  • 6
  • 80
  • 171
  • That's the problem, both code and infrastructure changed a lot in a short time span. – Phil Jan 07 '10 at 03:45
  • What was the most recent change? Start there and work your way back, one change at a time. – joeqwerty Jan 07 '10 at 03:54
  • Also, I realize you're looking for information on various troubleshooting tools, but troubleshooting is as much about using a particular process as it is about using a particular tool. Trace your changes from the most current change and work back from there. If you can identify what introduced the problem you can then diagnose it and determine what course of action to take to correct it. – joeqwerty Jan 07 '10 at 04:13

Compare Problem Stat to a Known Good State and look for the discrepancies.

A Known Good State can be an actual documented state. It can also be based on a standard of expected behavior, such as known expected behavior of networking protocols or such as rules of thumb about appropriate average CPU usage.


Using Wireshark or other network sniffer tool, you repeatedly see duplicate packets. Now you can delve in to try an figure out why you are seeing the same IP packet on the wire. Perhaps you have a "local router" scenario, or perhaps something is fragmenting IP packets.

Average CPU usage is at 90%. If the average is 90%, then the server is likely maxing out CPU frequently, causing everything to back up.

  • 1,650
  • 1
  • 14
  • 22

At the recommendation of John T, I have been enjoying using dstat with gnuplot.

  • 1,650
  • 1
  • 14
  • 22