(edit: partially understood and worked around, see comment)
I have a setup with nginx acting as a reverse proxy in front of a CherryPy app server. I'm using ab to compare performance going through nginx vs. not, and noticing that the former case has much worse worst-case performance:
$ ab -n 200 -c 10 'http://localhost/noop'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Server Software: nginx
Server Hostname: localhost
Server Port: 80
Document Path: /noop
Document Length: 0 bytes
Concurrency Level: 10
Time taken for tests: 3.145 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Total transferred: 29600 bytes
HTML transferred: 0 bytes
Requests per second: 63.60 [#/sec] (mean)
Time per request: 157.243 [ms] (mean)
Time per request: 15.724 [ms] (mean, across all concurrent requests)
Transfer rate: 9.19 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 5 48 211.7 31 3007
Waiting: 5 48 211.7 31 3007
Total: 5 48 211.7 31 3007
Percentage of the requests served within a certain time (ms)
50% 31
66% 36
75% 39
80% 41
90% 46
95% 51
98% 77
99% 252
100% 3007 (longest request)
$ ab -n 200 -c 10 'http://localhost:8080/noop'
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests
Server Software: CherryPy/3.2.0
Server Hostname: localhost
Server Port: 8080
Document Path: /noop
Document Length: 0 bytes
Concurrency Level: 10
Time taken for tests: 0.564 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Total transferred: 27600 bytes
HTML transferred: 0 bytes
Requests per second: 354.58 [#/sec] (mean)
Time per request: 28.202 [ms] (mean)
Time per request: 2.820 [ms] (mean, across all concurrent requests)
Transfer rate: 47.79 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.7 0 11
Processing: 6 26 23.5 24 248
Waiting: 3 25 23.6 23 248
Total: 6 26 23.4 24 248
Percentage of the requests served within a certain time (ms)
50% 24
66% 27
75% 29
80% 31
90% 34
95% 40
98% 51
99% 234
100% 248 (longest request)
What could be causing this? The only thing I can think of is that nginx is sending requests to the backend in a different order than they arrived, but that seems implausible.
The machine is an EC2 c1.medium instance with 2 cores, CherryPy is using a thread pool with 10 threads, and nginx has worker_connections = 1024.
UPDATE: Two more confusing findings:
- At a given concurrency, sending more requests improves performance. With a concurrency of 40 and 40 requests, I get a median time of 3s and max 10.5s; with a concurrency of 40 and 200 requests, I get a median of 38ms (!) and max 7.5s. In fact, the total time is less for 200 requests! (6.5s vs. 7.5s for 40). This is all repeatable.
- Monitoring both of the nginx worker processes with strace greatly improves their performance, taking e.g. a median time of 3s to 77ms, without noticeably changing its behavior. (I tested with a nontrivial API call and confirmed that strace doesn't change the response, as well as all of these performance observations still holding.) This is also repeatable.