Identifying bottleneck with nginx VPS load testing

Question

I'm trying to optimize a Digital Ocean droplet (512mb), testing using loader.io

I'm testing on my homepage, which is HTTPS / PHP. I setup FastCGI page cache, which got me from 100 req/sec to 2,000 req/sec.

But anything beyond 2,000 req/sec is resulting in a lot of timeouts and slow responses (goes from 20ms average to 1500ms average). I'm trying to identify the bottleneck. It's not CPU/memory yet, because load barely reaches 0.30 and memory usage is about half. I tried resizing to a much bigger droplet, and the timeouts still happen.

It's not FastCGI because the load testing performance is nearly identical on a basic .html file.

During the timeouts, error.log is empty. Nothing seems to be throwing errors (that I can find). Kern.log has this log:

TCP: Possible SYN flooding on port 80. Sending cookies.  Check SNMP counters
TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.

I tried disabling syncookies, which stopped those errors, but the timeouts still persisted.

During the timeouts, I start seeing a buildup of TIME_WAIT:

netstat -ntla | awk '{print $6}' | sort | uniq -c | sort -rn
   6268 ESTABLISHED
    831 TIME_WAIT
      6 LISTEN
      2 FIN_WAIT1
      1 Foreign
      1 established)

My question is, where else can I look to determine the bottleneck here? Are there other error logs or commands I can use to monitor?

Here is my nginx.conf (FastCGI and regular browser cache are in my default file). I've tried multi_accept, which seems to worsen the timeouts. I know worker_connections is ridiculous, but it doesn't seem to matter how much I raise or lower it:

user www-data;
worker_processes auto;
worker_rlimit_nofile 200000;
pid /run/nginx.pid;

events {
    worker_connections 200000;
    # multi_accept on;
    use epoll;

}

http {

    ##
    # Basic Settings
    ##

    open_file_cache max=200000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    server_tokens off;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 30;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    ##
    # Logging Settings
    ##
    access_log off;
    # access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    ##
    # Gzip Settings
    ##

    gzip on;
    gzip_disable "msie6";

    gzip_vary on;
    # gzip_proxied any;
    # gzip_comp_level 6;
    # gzip_buffers 16 8k;
    # gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

Here is my sysctl.conf

### IMPROVE SYSTEM MEMORY MANAGEMENT ###

# Increase size of file handles and inode cache
fs.file-max = 2097152

# Do less swapping
vm.swappiness = 10
vm.dirty_ratio = 60
vm.dirty_background_ratio = 2

### GENERAL NETWORK SECURITY OPTIONS ###

# Number of times SYNACKs for passive TCP connection.
net.ipv4.tcp_synack_retries = 2

# Allowed local port range
net.ipv4.ip_local_port_range = 2000 65535

# Protect Against TCP Time-Wait
net.ipv4.tcp_rfc1337 = 1

# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 15

# Decrease the time default value for connections to keep alive
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_syncookies = 1

### TUNING NETWORK PERFORMANCE ###

# Default Socket Receive Buffer
net.core.rmem_default = 31457280

# Maximum Socket Receive Buffer
net.core.rmem_max = 12582912

# Default Socket Send Buffer
net.core.wmem_default = 31457280

# Maximum Socket Send Buffer
net.core.wmem_max = 12582912

# Increase number of incoming connections
net.core.somaxconn = 4096

And I put these in limits.conf:

*         hard    nofile      500000
*         soft    nofile      500000
root      hard    nofile      500000
root      soft    nofile      500000

This doesn't really help, but I find the same thing on an AWS t2.micro instance, with one Xeon core and 1GB RAM. I hit 2000 pages per second from the nginx cache (in RAM /dev/shm) with around 2% CPU. With Wordpress (no caching plugin) I only get 11 pages per second without cache, CPU use is much higher of course. My best guesses are either some kind of IO, or nginx inefficiency. You could try increasing worker threads, but that could make things worse - interesting experiment though. — Tim, Feb 17 '16 at 19:07
I'm experiencing the same thing! It doesn't matter the size of the underlying server, requests are limited to ~1600/s sustained. When my load test starts, requests are much higher, but level off. I'm really curious what the bottle neck is here, as well. — Nathaniel Schweinberg, Jun 22 '18 at 01:56

Identifying bottleneck with nginx VPS load testing

0 Answers0