Suddenly started experiencing enormous wait times before requests hit server

Question

Our web app has been getting 20k-30k views a day, and steadily growing. About 4 days ago we suddenly started seeing wait times of 30-40 seconds before the HTML was even being delivered, on pages that had been rendering in 1 second even the day before.

In New Relic Synthetics these times show up simply as 'waiting'. By monitoring the nginx logs I can see that those times correspond to how long it takes incoming requests to hit the server.

This is a Rails app running Unicorn & Nginx on a 2GB droplet. I have previously optimized my Unicorn configuration according to this article: https://www.digitalocean.com/community/tutorials/how-to-optimize-unicorn-workers-in-a-ruby-on-rails-app, and our memory usage tends to hover around 50%. Nonetheless, I also noticed a bunch of these in that same log output from above:

2016/03/15 06:52:36 [error] 9460#0: *1110377 connect() to unix:/tmp/unicorn.streamfeed.sock failed (11: Resource temporarily   unavailable) while connecting to upstream, client: 66.226.75.13, server: streamfeed.com, request: "GET /watch HTTP/1.1", upstream: "http://unix:/tmp/unicorn.streamfeed.sock:/watch", host: "streamfeed.com", referrer: "http://google.com/"

Which, without having any better ideas, I figured might mean we're getting more traffic than our number of Unicorn workers can handle (our traffic has been increasing steadily and we just had our busiest day ever the other day). So I upgraded our droplet to 4GB and doubled the number of Unicorn workers, but that didn't fix it - sometimes I'm seeing those 11 errors, sometimes it's 110.

This is not a matter of optimization of my code - the relevant actions still take 1-2 seconds to actually process (per New Relic) once the request finally gets through, including any database queries. The delay is occurring before the request even hits the server. CPU usage of both of our droplets (app server & database) is under 50%, as is memory usage. There are no errors in the Unicorn logs. I have tried every method I've found online of tweaking/optimizing nginx & unicorn and nothing has made a difference - if anything, the load times continue to increase. We're now seeing 40-50 second waits for requests to be handled, which effectively means our site is crippled. I hadn't altered any relevant settings or code for a very long time before this started happening. I've rolled back my relevant files to what they were when this started happening, since none of the tweaks/changes I made made a difference. I'm desperate to get our site working again...hopefully someone out there can help.

nginx.conf:

upstream unicorn {
  server unix:/tmp/unicorn.streamfeed.sock fail_timeout=0;
}

server {
  server_name www.streamfeed.com;
  rewrite ^(.*) http://streamfeed.com$1 permanent;
}

server {
  listen 80 default_server deferred;
  # server_name example.com;
  server_name streamfeed.com;
  root /home/deployer/apps/streamfeed/current/public;

  location ^~ /assets/ {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    location ~* \.(js|css)$ {
      add_header Access-Control-Allow-Origin *;
    }
  }

  location ^~ /fonts/ {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    location ~* \.(ttf|ttc|otf|eot|woff|svg|font.css)$ {
      add_header Access-Control-Allow-Origin *;
    }    
  }

  try_files $uri/index.html $uri @unicorn;
  location @unicorn {
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_redirect off;
    proxy_pass http://unicorn;
  }

  error_page 500 502 503 504 /500.html;
  client_max_body_size 4G;
  keepalive_timeout 30;
}

unicorn.rb:

root = "/home/deployer/apps/streamfeed/current"
working_directory root
pid "#{root}/tmp/pids/unicorn.pid"
stderr_path "#{root}/log/unicorn.log"
stdout_path "#{root}/log/unicorn.log"

listen "/tmp/unicorn.streamfeed.sock"
worker_processes 11
timeout 60

config.ru: # This file is used by Rack-based servers to start the application.

if ENV['RAILS_ENV'] == 'production' 
  require 'unicorn/worker_killer'

  max_request_min =  500
  max_request_max =  600

  # Max requests per worker
  use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max

  oom_min = (240) * (1024**2)
  oom_max = (260) * (1024**2)

  # Max memory size (RSS) per worker
  use Unicorn::WorkerKiller::Oom, oom_min, oom_max
end

require ::File.expand_path('../config/environment',  __FILE__)
run Rails.application

Are all requests slow or just some? What do nginx error logs say? Can you hit the application server directly, bypassing nginx, and if so what happens? Why do you think it's a problem before it hits the server? Have you run a test with webpagetest.org that you can share? It really looks like a back end server issue, not an nginx issue, based on the information you've shared so far. — Tim, Mar 16 '16 at 21:51

Suddenly started experiencing enormous wait times before requests hit server

0 Answers0