0

I'm running nginx server with php-fpm.

I'm using "c5d.xlarge" ec2 instance type to host my website.

c5d.xlarge = 4 vcpu & 8GB RAM.

If active connections go beyond 10k on my ELB, CPU Utilization goes beyond 60-70% on my all 15 servers.

php-fpm configuration:

pm = dynamic
pm.max_children = 65
pm.start_servers = 10
pm.min_spare_servers = 10
pm.max_spare_servers = 20
pm.max_requests = 600

nginx configuration:

user  www-data;
worker_processes  4;
pid        /var/run/nginx.pid;


events {
    worker_connections  3072;
}


http {
        ##
        # Basic Settings
        ##

        charset utf-8;
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        server_tokens off;
        log_not_found off;
        types_hash_max_size 2048;
        client_max_body_size 16M;
        keepalive_timeout  70;
        client_header_timeout 3000;
        client_body_timeout 3000;
        fastcgi_read_timeout 3000;
        fastcgi_buffers 8 128k;
        fastcgi_buffer_size 128k;

        # MIME
        include /etc/nginx/mime.types;
        default_type application/octet-stream;
   log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /home/ubuntu/apps/log/default/access.log  main buffer=32k;
    error_log   /home/ubuntu/apps/log/default/error.log;


    gzip on;
        gzip_disable "MSIE [1-6]\.";

        # Only allow proxy request with these headers to be gzipped.
        gzip_proxied expired no-cache no-store private auth;

        # Default is 6 (1<n<9), but 2 -- even 1 -- is enough. The higher it is, the
        # more CPU cycles will be wasted.
        gzip_comp_level 7;
        gzip_min_length 20; # Default 20

        gzip_types text/plain text/css application/json application/javascript  application/x-javascript text/xml application/xml application/xml+rss text/javascript;

    include /etc/nginx/conf.d/*.conf;

top command output: top command output

Please suggest if these configurations are okay. I'm not too sure if this much cpu utilization for these many active connections is alright. I'll be really grateful if someone can guide me in setting up nginx and php-fpm for optimal performance.

If anymore information is required, please let me know.

Axel
  • 323
  • 1
  • 6
  • 17
  • That must be a heck of a busy website. I assume you're doing things like caching pages that are static or rarely changing for users that aren't logged in? That isn't practical in all situations. – Tim Nov 04 '19 at 19:05

1 Answers1

1

Lets get started with php-fpm

pm = dynamic

The dynamic process model keeps a bunch of workers waiting around for a request to come through. Meanwhile, these workers at utilizing CPU cycles and memory. ondemand is the best choice for FPM process models as it scales out as needed. Before you say "well, the daemon is going to have to fork() the new child" which is also intensive, remember that this is not a very expensive operation with modern OS's and hardware.

pm.max_children = 65

This is most likely completely unsustainable, unless your php memory_limit is set at 100MB, which is never enough to run any kind of PHP process. Realize that if you're going to let FPM scale out to 65 workers, they all could potentially use UP to memory_limit. Without the RAM to back it up, this setting value is a recipe for locking up a server.

pm.start_servers = 10
pm.min_spare_servers = 10
pm.max_spare_servers = 20

I won't comment on these as they are sane enough.

pm.max_requests = 600

This value is meaningless in the ondemand PM, but I will speak to it. What's the point of killing the worker after 600 requests? If your application is leaking that much memory that you need to reap it, you need to evaluate the application itself.

On to nginx

worker_process = 4

Worker Processes should actually be set to auto to allow nginx to find the right balance. Typically, one is enough for most use cases, however.

worker_connections  3072;

This is another potentially dangerous setting. Have you consulted ulimit -n to see how many open files are allowed per process? How many cores are available on your system (grep -c processor /proc/cpuinfo)? Generally speaking, this value should be $CPU_CORES * ulimit, which in your case is probably 2048, given your choice of EC2 instance.

client_header_timeout 3000;
client_body_timeout 3000;
fastcgi_read_timeout 3000;

This is another setting where you're potentially going to to be burned! Do you really want to leave connections potentially hanging for nearly one hour while they wait for the timeout to be reached? Overzealous timeout values in applications can be a large resource drain on servers. Timeout values were designed to be right above the normal time that a particular event takes to occur, and not much more. Otherwise, what's the point of having a timeout at all? :)

t3ddftw
  • 331
  • 2
  • 5
  • grep -c processor /proc/cpuinfo 4 ulimit -n 1024 – Axel Nov 05 '19 at 07:25
  • I'm thinking of setting pm = static but what if there is too much load and the values reach to max. – Axel Nov 05 '19 at 07:28
  • Odd that your ulimit is that high. Oh well -- 3072 should be fine, then. I do not recommend PM static if you only have 8GB of RAM. Again, think of the `memory_limit` and amount of RAM. Unless you only want to have 14 workers, you're liable to exhaust RAM. – t3ddftw Nov 05 '19 at 14:52
  • I've increased pm.max_requests to 1000 from 600 and also reduced all the above timeouts from 3000 to 300. Problem I'm facing right now is that some of the users are complaining of 502 bad gateway error while performing some actions which take a little time eg. while fetching data via api or something. It wasn't the case when we were on Apache, only with nginx. We are using elastic cache redis by aws. Earlier we were using cache.r5.large redis instance type and now we are using t2.medium. Does it makes a difference as well? – Axel Nov 06 '19 at 06:12
  • @Axel -- You need to set the proxy timeout values in Nginx: ` proxy_connect_timeout 300; proxy_send_timeout 300; proxy_read_timeout 300;` – t3ddftw Nov 06 '19 at 18:36
  • we are using nginx as a standalone server, not as a proxy. We have bunch of nginx servers behind the aws application load balancer. Do we still need above mentioned proxy timeouts? – Axel Nov 06 '19 at 19:03
  • @Axel - You're using Nginx with PHP-FPM, right? If so, Nginx is running as a proxy ;) – t3ddftw Nov 06 '19 at 19:08
  • I hope this solves my 504 bad gateway errors when data is being fetched from DB and it takes a little longer to fetch the data. – Axel Nov 06 '19 at 19:23