5

We have a situation where a site starts to serve a 502 Bad Gateway but doesn't seem to recover after the upstream servers rebound. The nginx server is setup to proxy/load balance requests for two upstream servers. It looks like the database server will start to get a high load average, causing the web servers (upstream servers) to serve content slowly and timeout (according to the nginx server). Nginx will then serve a 502, which makes sense given the situation.

What is strange though is that it seems that nginx doesn't pick up on when the web servers rebound and we must restart nginx to get it to start serving the site again. Is there a good way to fix this? I'm looking through the proxy settings currently to see if there is something to set but not having much luck finding anything.

Looking at nginx logs we see entries like (just pulling three examples of errors in this timeframe):

2013/06/12 13:53:40 [error] 29840#0: *258391 upstream timed out (110: Connection timed out) while reading response header from upstream, client: n.n.n.n, server: www.example.org, request: "GET / HTTP/1.1", upstream: "http://n.n.n.n:80/", host: "www.example.org"
2013/06/12 13:54:11 [error] 29840#0: *261105 no live upstreams while connecting to upstream, client: n.n.n.n, server: www.example.org, request: "GET /HTTP/1.1", upstream: "http://example_rack/", host: "www.example.org"
2013/06/12 13:54:46 [alert] 29840#0: *261470 stalled cache updating, error:0 while closing request, client: n.n.n.n, server: n.n.n.n:80
Rob
  • 266
  • 3
  • 9

1 Answers1

0

This answer describes a solution to a problem that matches the title and the description. It is not an attempt to answer Rob's question 9 years later, hoping it may be helpful to others. The solution is offered within a Docker context, but may apply to any nginx config.

In our case the IP address of the upstream server changed during a reboot/restart, possibly because the old service is not pulled down until the new service starts.

We stumbled on this ServerFault post from which we took the following hints:

First answer: "When you use a variable to specify the domain name in the proxy_pass directive, NGINX re‑resolves the domain name when its TTL expires. You must include the resolver directive to explicitly specify the name server"

The second answer in the same topic linked us with a detailed example:

resolver 172.16.0.23;
set $upstream_endpoint http://service-1234567890.us-east-1.elb.amazonaws.com;
location / {
    proxy_pass $upstream_endpoint;
}`

See the article for the caveats.

marvin_x
  • 1
  • 1