Nginx With PHP FPM - Resource Temporarily Unavailable - 502 Error
I am using a some code to send off just over 160 GET requests asynchronously using curl to my API which is running Nginx with Php-fpm on Ubuntu server 16.04. Each request fetches a different selection of data from the database before returning it as a JSON response. This number of requests is small enough that I believe it should not reach any of the various default limits (number of socket connections, file descriptors etc). However the fact that they are all being sent/recieved at the same time appears to be causing issues.
The vast majority of the requests will succeed, but a couple (consistently the same number in sequential tests, but which vary depending on the configuration) will get a "502 Bad Gateway" response.
If I look at the nginx error log (/var/log/nginx/error.log
), I see a these error messages:
2017/11/21 09:46:43 [error] 29#29: *144 connect() to unix:/run/php/php7.0-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 192.168.3.7, server: , request: "GET /1.0/xxx HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "my.domain.org"
There are always exactly the same in the number fo the number of "502 Bad Gateway" error messages in the log as I recieve back from the API.
Meanwhile, when watching the fpm log file during an execution of the test (with tail -100f /var/log/php7.0-fpm.log
), nothing happens. It just has the following:
[21-Nov-2017 11:54:29] NOTICE: fpm is running, pid 329
[21-Nov-2017 11:54:29] NOTICE: ready to handle connections
[21-Nov-2017 11:54:29] NOTICE: systemd monitor interval set to 10000ms
Although my fpm configuration (at /etc/php/7.0/fpm/php-fpm.conf
) specifies an error log with error_log = /var/log/php7.0-fpm.log
, there doesn't appear to be such a file, suggesting no errors.
A Working Configuration
I have found that if I tweak the fpm configuration, I can get the webserver to work
(no 502 errors) if I configure the /etc/php/7.0/fpm/pool.d/www.conf
file to use a static
number of 15
threads rather than dynamically spawning processes or using a smaller number of static processes.
pm = static
pm.max_children = 15
I believe this works because there are already ample threads ready to go in order to take the sudden hit, and there is no delay incurred with spawning or shutting down the threads.
However, this does mean that my webserver will use much more memory than I should like. Ideally, I would like the pm.max_children
to be a number equal to 2x the number of vCPUs on the server (so 8 or less).
In this case I am using a quad core server, but would like to possibly scale down to a dual core instance.
Ideally, I would like the server to answer all of the requests in-time even if the total time taken is a lot longer, e.g. a queue and adjusting timeouts.
Configuration Settings
The default php-fpm listen.backlog
value is 511
, but I set it to 2000 just to eliminate it from being a factor.
listen.backlog = 2000
For Nginx, I set 1024 worker_connections
and worker_processes auto;
, so that should be 4.
I also have the following buffer and timeout settings to try and prevent them being a factor:
##
# Buffere settings
##
client_body_buffer_size 10M;
client_header_buffer_size 1k;
client_max_body_size 512m;
large_client_header_buffers 2 1k;
##
# Timeout settings
##
client_body_timeout 120;
client_header_timeout 120;
keepalive_timeout 120;
send_timeout 120;
fastcgi_connect_timeout 60s;
fastcgi_next_upstream_timeout 40s;
fastcgi_next_upstream_tries 10;
fastcgi_read_timeout 60s;
fastcgi_send_timeout 60s;
fastcgi_cache_lock_timeout 60s;
It is worth noting that we get all of the requests (including 502) in about 20 seconds so we are not reaching these. Also, even though fastcgi_next_upstream_tries
is set to 10, I only get 1 resource unavailable message for each 502 error message, rather than 10x that many for the 10 tries it should be attempting.
Similar / Related Questions
I see that there are many similar questsions on serverfault and stack overflow. I am detailing them here so this question doesn't just get marked as a duplicate.
Serverfault - Requests are never queued after pm.max_children with Nginx and PHP-FPM. Appears to be a very similar question, but there have been no answers even though it was posted 3 years ago, and it has far less detail than here. Also, some of my requests must be getting queued successfully, unlike the question that suggests that as soon as the max is reached all requests are dropped.
ServerFault - nginx ERROR 502 & Resource temporarily unavailable) while connecting to upstream, client. This post seems similar (he is describing the same problem), but as one of the answers pointed out, his socket files weren't matching and mine are. My
/etc/php/7.0/fpm/pool.d/www.conf
config file has:listen = /run/php/php7.0-fpm.sock
Which you can see lines up with the socket file in the error messages nginx provides.
ServerFault - Need to increase nginx throughput to an upstream unix socket — linux kernel tuning? The answer here suggested to set
net.core.somaxconn
andnet.core.netdev_max_backlog
which I set to 4096 and 1000 accordingly. The issue still persists.ServerFault - php-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream - the suggestion here is to have pm = ondemand and have max_children set to 4000. This is not suitable for me as it could result in my quad core server having 4000 threads and just eat up memory.
Question
I believe Nginx is being too fast for the PHP-fpm side to handle. At some point fpm just doesn't respond to the nginx request, so Nginx gives up and sends back a 502 error. Is there a way (probably a configuration variable or two) to fix this so that fpm will queue up the requests, or have nginx retry again later (fastcgi_next_upstream_tries
doesn't seem to have any effect)? I don't mind how long it takes the webserver to serve up all the requests (increase timeouts), only that I can set my fpm number of processes to an appropriate number relative to my CPU, and all of these 160 requests will be served.
Update - Works Fine Using TCP Sockets
I just tried swapping FPM from listening on a unix file socket to TCP sockets as detailed here.
E.g. changing fpm to: Listen 127.0.0.1:9000
and updating nginx to use:
fastcgi_pass 127.0.0.1:9000;
This seems to have done the trick as a workaround. E.g. I don't get any 502 errors even if I use a dynamic pool or even a static pool with just 2 fpm threads.
However, I would love to know why this works instead of using a local unix file socket, and whether there is just a configuration change I can make to have the file socket based solution work, as that is the default and that many people are likely to be using.