1

I am trying to scale an nginx installation to the best of it's ability.

I am running one nginx instance with 6 worker_processes (6 cores) and 5 backend servers consisting af a uwsgi setup with 10 workers each. (total of 50 workers).

However, any benchmark I attempt with different parameters (using ab) for total and concurrent connections seem to edge out at around 1000 requests/second.

I have disabled all logging for nginx and uwsgi (to avoid slowing down due to disk issues). I am testing against a Flask python application that merely sends {'status':'ok'} back. No database access, no calculations, nothing.

The relevant part of the nginx config looks like this:

    user www-data;
    worker_processes 6;
    worker_rlimit_nofile 100000;
    pid /var/run/nginx.pid;

    events {
            use epoll;
            worker_connections 2048;
            multi_accept on;
    }

    http {

            ##
            # Basic Settings
            ##

            sendfile on;
            tcp_nopush on;
            tcp_nodelay on;
            keepalive_timeout 65;
            types_hash_max_size 2048;
            # server_tokens off;

            # server_names_hash_bucket_size 64;
            # server_name_in_redirect off;

            include /etc/nginx/mime.types;
            default_type application/octet-stream;

            ##
            # Logging Settings
            ##

            access_log off; # /var/log/nginx/access.log;
            error_log /var/log/nginx/error.log;

            <...>
    }

I am looking for any tips, anything I have overlooked, to increase throughput. Looking at stats for each uwsgi pool (using uwsgitop) they don't at any point seem hard pressed to perform, leading me to believe nginx is the bottleneck. Also, the performance was the same with a single pool of workers instead of 10. Additionally htop also shows that I am nowhere near to max in terms of memory or CPU.

Christian P.
  • 165
  • 1
  • 2
  • 7
  • have you checked your network config for tuning possibilities? We asked ourself the same questions weeks ago and it turned out to be the network. – VF_ Oct 02 '14 at 14:54
  • It could be your test tool that's reaching a limit. – wurtel Oct 02 '14 at 14:56

3 Answers3

2

I recommend you install sysstat package then check the recorded info with sar.

sar -n SOCK -s <start_time> -e <end_time> to get the amount of sockets during the benchmark

sar -n DEV -s <start_time> -e <end_time> to get network interfaces packets and bandwith

sar -d -s <start_time> -e <end_time> to get io stats per device

sar -v -s <start_time> -e <end_time> to get number of file handles and inodes

etc

Check security limits for your users (max number of open files, max number of processes, etc).

Then check you kernel settings : local port range, somaxconn, device txqueue, netdev backlog, activate socket recycle for TIME_WAIT states if necessary (in regard to tcp-tw with sar -n SOCK) with SO_LINGER in nginx or tcp_tw_recycle (if you don't have NAT) or reuse (for outgoing connections), change the amount of tw_buckets if necessary, make sure sack/dsack and timestamps are enabled, reduce FIN_WAIT_2 timeout, increase max file handles if needed etc.

There could be alot of factors.

Before checking all that, make sure you don't run ab on the same rig and that python app has good response times.

And a simple test to be sure python app is not the culprit : same benchmark on a static file directly server by nginx.

Xavier Lucas
  • 12,815
  • 2
  • 44
  • 50
1

I would look at file descriptors, possible network/interface saturation, and IO issues.

For seeing if the network interface is saturated use iptraf which is a command line tool to view realtime stats. Simply:

iptraf

For IO issues use iostat

iostat 1

that will show the IO usage and load every 1 second.

For file descriptor issues use lsof or /proc:

lsof -P -n -p <PID> | wc -l
ls /proc/<PID>/fd | wc -l

Use the ulimit -a | grep files command (as the user that run's the process) to verify how many files you're permitted to have open. The default is 1024.

See this page for more info: http://www.cyberciti.biz/tips/linux-procfs-file-descriptors.html

See this question for nginx specific file descriptor problem which may very well be related to you problem: understanding max file descriptors for linux and nginx, and best value for worker_rlimit_nofile

ekeyser
  • 165
  • 4
  • We're using NFS (part of a larger server pool), so I am using `nfsiostat`. When running the test I get these results http://pastebin.com/9BdCFurT (pastebin for formatting). I am using UNIX sockets, am I better off using TCP? – Christian P. Oct 03 '14 at 08:05
  • There is evidence and general understanding that tcp offers better scalability past a certain point. Might be worth trying. Regarding the nfsiostat I'm not sure I can make anything out or say one way or another if that's the bottleneck. I would say that if NFS is the filesystem on which the nginx assets are being served then you are introducing a potential bottleneck as opposed to a local filesystem. At the very least there is some overhead involved in wrapping NFS around I/O. – ekeyser Oct 04 '14 at 16:15
1

In addition to the other two answers here, conntrack (connection tracking) might also be an issue. If you are using Linux and if you are using netfilter (i.e. iptables) your conntrack table might be full.

First check if conntrack is enabled. For example:

$ /sbin/lsmod | grep conntrack
ip_conntrack           51617  1 xt_state

$ lsmod | grep -i con
nf_conntrack_ipv4      19159  5 
nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
nf_conntrack           92358  5 xt_state,iptable_nat,nf_conntrack_ipv4,nf_nat_ipv4,nf_nat

The output will vary based on the kernel version.

If if either of the nf_conntrack or ip_conntrack modules are loaded you can see how many conntrack entries there are and check what your max is with the following:

Red Hat (RHEL, CentOS, Fedora, etc):

$ sudo wc -l /proc/net/ip_conntrack
$ /sbin/sysctl -a | grep conntrack_max

or

$ sudo wc -l /proc/net/nf_conntrack
$ /sbin/sysctl -a | grep conntrack_max

Debian:

$ cat /proc/sys/net/netfilter/nf_conntrack_count
$ /sbin/sysctl -a | grep conntrack_max

If you have filled the conntrack table then you will need to increase the limit via sysctl or /etc/sysctl.conf.

Note: conntrack does not just apply to the server. You will need to check each point between yourself and the server: Client computer, load balancer (nginx), upstream (backend) server, and possibly even any routers.

Gene
  • 3,633
  • 19
  • 39