8

Running very high volume traffic on these servers configured with django, gunicorn, supervisor and nginx. But a lot of times I tend to see 502 errors. So I checked the nginx logs to see what error and this is what is recorded:

[error] 2388#0: *208027 connect() to unix:/tmp/gunicorn-ourapp.socket failed (11: Resource temporarily unavailable) while connecting to upstream

Can anyone help debug what might be causing this to happen?

This is our nginx configuration:

sendfile on;
tcp_nopush on;
tcp_nodelay off;

listen 80 default_server;
server_name imp.ourapp.com;
access_log /mnt/ebs/nginx-log/ourapp-access.log;
error_log /mnt/ebs/nginx-log/ourapp-error.log;

charset utf-8;
keepalive_timeout 60;
client_max_body_size 8m;

gzip_types text/plain text/xml text/css application/javascript application/x-javascript application/json;

location / {
    proxy_pass http://unix:/tmp/gunicorn-ourapp.socket;
    proxy_pass_request_headers on;
    proxy_read_timeout 600s;
    proxy_connect_timeout 600s;
    proxy_redirect http://localhost/ http://imp.ourapp.com/;
    #proxy_set_header Host              $host;
    #proxy_set_header X-Real-IP         $remote_addr;
    #proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
    #proxy_set_header X-Forwarded-Proto $my_scheme;
    #proxy_set_header X-Forwarded-Ssl   $my_ssl;
}

We have configure Django to run in Gunicorn as a generic WSGI application. Supervisord is used to launch the gunicorn workers:

home/user/virtenv/bin/python2.7 /home/user/virtenv/bin/gunicorn --config /home/user/shared/etc/gunicorn.conf.py daggr.wsgi:application

This is what the gunicorn.conf.py looks like:

import multiprocessing

bind = 'unix:/tmp/gunicorn-ourapp.socket'
workers = multiprocessing.cpu_count() * 3 + 1
timeout = 600
graceful_timeout = 40

Does anyone know where I can start digging to see what might be causing the problem?

This is what my ulimit -a output looks like on the server:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 59481
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 50000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
user1068118
  • 81
  • 1
  • 4

4 Answers4

4

I was able to get around this issue by editing /proc/sys/net/core/somaxconn from 128 to 20000. This allows larger bursts of traffic. I may not have needed to set it so high, but this application can burst very high. I am also using gunicorn & nginx.

Matt Williamson
  • 323
  • 2
  • 4
  • 10
3

In my case, this error was because of my gunicorn configuration:

worker_class = "sync"

Which I fixed using:

worker_class = "gevent" # "sync"

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
estevo
  • 131
  • 1
  • In my case, instead of subclassing `gunicorn.workers.sync.SynWorker`, I subclassed `gunicorn.workers.ggevent.GeventWorker`. Worked a treat! – markrian Oct 24 '16 at 13:37
3

I was able to reproduce this issue with this example: https://github.com/pawl/somaxconn_test

Increasing the net.core.somaxconn ended up fixing it.

If it's not a docker container, you can do that with sysctl -w net.core.somaxconn=<your value>. If it's a docker container, you can use this flag: --sysctl net.core.somaxconn=1024

pawl
  • 131
  • 1
0

That sounds like it would be caused because all the gunicorn workers are in use. I would temporarily turn on loggin in gunicorn. See the logging settings here. This should allow you to see the state of the gunicorn workers and why a new connection can't be made at the time the 502 happens.

JaseAnderson
  • 101
  • 4