60

I have a server which was working ok until 3rd Oct 2013 at 10:50am when it began to intermittently return "502 Bad Gateway" errors to the client.

Approximately 4 out of 5 browser requests succeed but about 1 in 5 fail with a 502.

The nginx error log contains many hundreds of these errors;

2013/10/05 06:28:17 [error] 3111#0: *54528 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 66.249.66.75, server: www.bec-components.co.uk  request: ""GET /?_n=Fridgefreezer/Hotpoint/8591P;_i=x8078 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "www.bec-components.co.uk"

However the PHP error log does not contain any matching errors.

Is there a way to get PHP to give me more info about why it is resetting the connection?

This is nginx.conf;

user              www-data;
worker_processes  4;
error_log         /var/log/nginx/error.log;
pid               /var/run/nginx.pid;

events {
   worker_connections  1024;
}

http {
  include          /etc/nginx/mime.types;
  access_log       /var/log/nginx/access.log;

  sendfile               on;
  keepalive_timeout      30;
  tcp_nodelay            on;
  client_max_body_size   100m;

  gzip         on;
  gzip_types   text/plain application/xml text/javascript application/x-javascript text/css;
  gzip_disable "MSIE [1-6]\.(?!.*SV1)";

  include /gvol/sites/*/nginx.conf;

}

And this is the .conf for this site;

server {

  server_name   www.bec-components.co.uk bec3.uk.to bec4.uk.to bec.home;
  root          /gvol/sites/bec/www/;
  index         index.php index.html;

  location ~ \.(js|css|png|jpg|jpeg|gif|ico)$ {
    expires        2592000;   # 30 days
    log_not_found  off;
  }

  ## Trigger client to download instead of display '.xml' files.
  location ~ \.xml$ {
    add_header Content-disposition "attachment; filename=$1";
  }

   location ~ \.php$ {
      fastcgi_read_timeout  3600;
      include               /etc/nginx/fastcgi_params;
      keepalive_timeout     0;
      fastcgi_param         SCRIPT_FILENAME  $document_root$fastcgi_script_name;
      fastcgi_pass          127.0.0.1:9000;
      fastcgi_index         index.php;
   }
}

## bec-components.co.uk ##
server {
   server_name   bec-components.co.uk;
   rewrite       ^/(.*) http://www.bec-components.co.uk$1 permanent;
}
ivanleoncz
  • 1,433
  • 4
  • 18
  • 32
Nigel Alderton
  • 942
  • 3
  • 9
  • 18
  • What was changed on that day? Updated your application or PHP? What's your application? Did you enable debugging in php-fpm? – Pothi Kalimuthu Oct 05 '13 at 15:34
  • Nothing was changed on that day. Server config was not changed, nor were any PHP scripts. It's not out of disk space. My application is just a set of `PHP` scripts. I'm not using `php-fpm`, I'm just running `php-fastcgi` by doing `php-cgi -b 127.0.0.1:9000`. It's been working without fault for 3 years. I can't work out why it has developed this issue. – Nigel Alderton Oct 05 '13 at 16:20
  • I had similar issue recently where nginx was complaining about Connection reset by peer while reading response header from upstream, in my case it was uWSGI which was the real problem, restarting uWSGI fixed the issue for me, as to why it was happening is a separate issue. – APZ Jan 26 '14 at 10:46
  • Your upstream service ( `php-cgi -b 127.0.0.1:9000` ) is failing intermittently, perhaps due to increased traffic and lack of resources. – LinuxDevOps Mar 29 '14 at 14:53

12 Answers12

30

i'd always trust if my webservers are telling me: 502 Bad Gateway

  • what is the uptime of your fastcgi/nginx - process?
  • do you monitor network-connections?
  • can you confirm/deny a change of visitors-count around that day?

what does it mean:

  • you fastcgi-process is not accessible by nginx; either to slow or not corresponding at all. bad gateway means: nginx cannot fastcgi_pass to that defined ressource 127.0.0.1:9000; at that very specific moment.

  • your inital error-logs tells it all:

.

recv() failed 
    -> nginx failed

(104: Connection reset by peer) while reading response header from upstream, 
    -> no complete answer, or no answer at all
upstream: "fastcgi://127.0.0.1:9000", 
    -> who is he, who failed???

from my limited pov i'd suggest:

  • restart your fastcgi_process / server
  • check your access-log
  • enable debug-log
17

I know this topic is old, but it still continues to pop up occasionally, so, looking for answers on the web, I came up with the following three possibilities:

  1. A programming error is sometimes segfaulting php-fpm, which in turn means that the connection with nginx will be severed. This will usually leave at least some logs around and/or core dumps, which can be analysed further.
  2. For some reason, PHP is not being able to write a session file (usually: session.save_path = "/var/lib/php/sessions"). This can be bad permissions, bad ownership, bad user/group, or more esoteric/obscure issues like running out of inodes on that directory (or even a full disk!). This will usually not leave many core dumps around and possibly not even anything on the PHP error logs.
  3. Even more tricky to debug: an extension is misbehaving (occasionally hitting some kind of inner limit, or a bug which is not triggered all the time), segfaulting, and bringing the php-fpm process down with it — thus closing the connection with nginx. The usual culprits are APC, memcache/d, etc. (in my case it was the New Relic extension), so the idea here is to turn each extension off until the error disappears.
  • 1
    +1 In my case it was #1 - programming error. – Nimbuz Sep 06 '16 at 01:49
  • 1
    We ran into this error and disabling the New Relic APM PHP extension revealed a more specific error that allowed us to track down the problem: [29-Jan-2018 16:47:48 UTC] PHP Fatal error: Allowed memory size of 805306368 bytes exhausted (tried to allocate 262144 bytes) in vendor/magento/module-configurable-product/Pricing/Price/ConfigurableRegularPrice.php on line 142 [29-Jan-2018 16:47:48 UTC] PHP Fatal error: Allowed memory size of 805306368 bytes exhausted (tried to allocate 323584 bytes) in Unknown on line 0 My guess is that New Relic choked on the "Unknown" path. – Erik Hansen Jan 29 '18 at 17:43
  • 1
    ty in my case the code was segfaulting somehow, removing line by line gave me this confirmation. – Herz3h May 31 '21 at 09:18
10

Kept getting this as well. Solved it by increasing the opcache memory limit, if you use it (replacement for APC). Seems PHP-FPM dropped connections whenever the cache got too full. This is also the reason why shgnInc's answer fixes it for a short time.

So find the file /etc/php5/fpm/php.ini (or equivalent in your distribution) and increase memory_consumption to whatever level your site needs. Disabling opcache may also work.

[opcache]
opcache.memory_consumption = 196 
Manu
  • 201
  • 2
  • 4
6

In my case of same problem, I just restart the php-fpm service so it solved.

sudo service php5-fpm restart

Or some times this problem happen because of huge of requests. By default the pm.max_requests in php5-fpm maybe is 100 or below.

To solve it increase its value depend on the your site's requests, For example 500.

And after the you have to restart the service

shgnInc
  • 1,634
  • 3
  • 21
  • 29
2

You may want to consider this git on github: https://gist.github.com/amichaelgrant/90d99d7d5d48bf8fd209

I encountered a similar situation, when I checked error logs for my upstream servers they were reporting some ulimit error so I increased that to 1000000(on both the upstream and nginx boxes) and everything worked fine

Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
2

In my case, disabling xdebug extension did help.

Vasily
  • 121
  • 2
  • ditto, in my case i set a condition for a breakpoint and at that moment i disabled the breackpoint the error was gone. – roman204 Apr 16 '19 at 18:32
2

This issue may also arise if a PHP-FPM process exceeds its allocated memory limit. When this happens, the connection between NGINX and PHP-FPM is severed and NGINX returns a 502 Bad Gateway. The PHP-FPM process memory limit is controlled by the memory_limit variable. This can be set with php_admin_value[memory_limit] in the PHP-FPM configuration file.

It is important to note that the memory limit applies on a per-script basis. With n PHP-FPM processes, the total memory usage can be up to memory_limit * n. Be sure to check that your machine has sufficient memory headroom!

Francis
  • 21
  • 1
1

I just had a similar problem:

You connect to php-fpm on Port 9000. (fastcgi://127.0.0.1:9000)

Standard configuration on Ubuntu on my server is:

/etc/php/7.0/fpm/pool.d/www.conf:

listen = /run/php/php7.0-fpm.sock

you have to change this to:

listen = 0.0.0.0:9000

In my case, I did update my server 1 1/2 Months ago, overwriting my custom configuration with the default. Now having restarted php-fpm this error came to effect with delay.

J. Scott Elblein
  • 167
  • 1
  • 11
Martin Krung
  • 115
  • 5
1

For me it was the server running out of memory and php-fpm getting killed by OOM killer. The solution was to increase amount of server memory.

1

For me it was because php-fpm was hitting the max_children limit. The php-fpm log for the pool in question pointed me in the right direction

bruchowski
  • 151
  • 1
  • 6
0

I got a similar issue: random Connection reset by peer when server was under load. Eventually found it was due to a difference in keepalive_timeout values between nginx and upstream (gunicorn in my case). Nginx was at 75s and upstream was just a few seconds. Thus sometimes upstream dropped the connection and nginx didn't understand why.

Increasing upstream value to be identical to nginx' solved the issue.

Eino Gourdin
  • 103
  • 1
  • 5
0

If you are using multiple reverse proxies, you should be aware that nginx will send a connection reset in some situations. If for instance you are getting "n worker_connections are not enough" in your logs, that's the source of the connection reset. Each request on a reverse proxy requires 2 worker_connections. If you don't know that then your margin of safety may not be a margin at all.

Jason
  • 101
  • 2