5

I have a nginx reverse proxy. The server is close to serving 600-700 requests per second. I have a Munin HTTP load time plugin which is outputting this:

http://monitor.wingify.com/munin/visualwebsiteoptimizer.com/lb1.visualwebsiteoptimizer.com-http_loadtime.html

Now, the problem is I am seeing some spikes in the graph. Expected response times should always be under 200ms. I am keeping an eye on syslog and messages but I am unable to figure out the actual cause of this. I was wondering if there is any good HTTP response time profiling system which I can install / embed with this nginx server and get a detailed reports / logs on the breakup of time taken by different things and what exactly is the cause of the spikes.

The profiling system would also help me understand bottlenecks and how can I further optimize the latency.

Most important right now is to investigate the cause of the spikes in the HTTP load time graphs (similar pattern is reported by external monitors - Pingdom) and to fix it to get consistent response times

Thanks

Sparsh Gupta
  • 1,117
  • 7
  • 20
  • 31

1 Answers1

6

Wow! How are you measuring load times? As far as I knew nginx would only report request response times ($request_time) which is something completely different.

I've not had a good look for a few months, but last time I checked there was very little available for analysing response times. PastMon looks promising. And there are commercial tools like Client Vantage (rather expensive).

I ended up writing my own - its not that hard really to create a simple awk script to report all hits which are over a threshold - but remember that you'll need to go back and check to see how the URL behaves the rest of the time. e.g.

# looking for URLs matching 'example.com/interesting' 
# with URL in $6 and $request_time in $8

BEGIN {}
$6==/example.com\/interesting/ {
  if ( $8>0.3) {
     n[$6]+=1;       # no of hits by URL
     t[$6]+=$8;      # sum of times by url
     s[$6]+=$8 * $8; # sum of sq of times by url
     if (m[$6]<$8) m[$6]=$8; # max time for url
  }
}
END {
   print "url, n, avg, stddev, max";
   for (x in n) {
     print x ", " n[x] ", " t[x]/n[x] ", " sqrt(s[x]-t[x]*t[x])/(n[x]-1) ", " m[x]; 
   }
}

If you are measuring the response times on the proxy, then you're also measuring the time taken to deliver the request across the network - i.e. your application may be behaving consistently but the spikes are introduced by changes on the internet / client. If you want to see what your application is really doing then you need to look at your webserver logs.

symcbean
  • 19,931
  • 1
  • 29
  • 49