Our web app has been getting 20k-30k views a day, and steadily growing. About 4 days ago we suddenly started seeing wait times of 30-40 seconds before the HTML was even being delivered, on pages that had been rendering in 1 second even the day before.
In New Relic Synthetics these times show up simply as 'waiting'. By monitoring the nginx logs I can see that those times correspond to how long it takes incoming requests to hit the server.
This is a Rails app running Unicorn & Nginx on a 2GB droplet. I have previously optimized my Unicorn configuration according to this article: https://www.digitalocean.com/community/tutorials/how-to-optimize-unicorn-workers-in-a-ruby-on-rails-app, and our memory usage tends to hover around 50%. Nonetheless, I also noticed a bunch of these in that same log output from above:
2016/03/15 06:52:36 [error] 9460#0: *1110377 connect() to unix:/tmp/unicorn.streamfeed.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 66.226.75.13, server: streamfeed.com, request: "GET /watch HTTP/1.1", upstream: "http://unix:/tmp/unicorn.streamfeed.sock:/watch", host: "streamfeed.com", referrer: "http://google.com/"
Which, without having any better ideas, I figured might mean we're getting more traffic than our number of Unicorn workers can handle (our traffic has been increasing steadily and we just had our busiest day ever the other day). So I upgraded our droplet to 4GB and doubled the number of Unicorn workers, but that didn't fix it - sometimes I'm seeing those 11 errors, sometimes it's 110.
This is not a matter of optimization of my code - the relevant actions still take 1-2 seconds to actually process (per New Relic) once the request finally gets through, including any database queries. The delay is occurring before the request even hits the server. CPU usage of both of our droplets (app server & database) is under 50%, as is memory usage. There are no errors in the Unicorn logs. I have tried every method I've found online of tweaking/optimizing nginx & unicorn and nothing has made a difference - if anything, the load times continue to increase. We're now seeing 40-50 second waits for requests to be handled, which effectively means our site is crippled. I hadn't altered any relevant settings or code for a very long time before this started happening. I've rolled back my relevant files to what they were when this started happening, since none of the tweaks/changes I made made a difference. I'm desperate to get our site working again...hopefully someone out there can help.
nginx.conf:
upstream unicorn {
server unix:/tmp/unicorn.streamfeed.sock fail_timeout=0;
}
server {
server_name www.streamfeed.com;
rewrite ^(.*) http://streamfeed.com$1 permanent;
}
server {
listen 80 default_server deferred;
# server_name example.com;
server_name streamfeed.com;
root /home/deployer/apps/streamfeed/current/public;
location ^~ /assets/ {
gzip_static on;
expires max;
add_header Cache-Control public;
location ~* \.(js|css)$ {
add_header Access-Control-Allow-Origin *;
}
}
location ^~ /fonts/ {
gzip_static on;
expires max;
add_header Cache-Control public;
location ~* \.(ttf|ttc|otf|eot|woff|svg|font.css)$ {
add_header Access-Control-Allow-Origin *;
}
}
try_files $uri/index.html $uri @unicorn;
location @unicorn {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://unicorn;
}
error_page 500 502 503 504 /500.html;
client_max_body_size 4G;
keepalive_timeout 30;
}
unicorn.rb:
root = "/home/deployer/apps/streamfeed/current"
working_directory root
pid "#{root}/tmp/pids/unicorn.pid"
stderr_path "#{root}/log/unicorn.log"
stdout_path "#{root}/log/unicorn.log"
listen "/tmp/unicorn.streamfeed.sock"
worker_processes 11
timeout 60
config.ru: # This file is used by Rack-based servers to start the application.
if ENV['RAILS_ENV'] == 'production'
require 'unicorn/worker_killer'
max_request_min = 500
max_request_max = 600
# Max requests per worker
use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max
oom_min = (240) * (1024**2)
oom_max = (260) * (1024**2)
# Max memory size (RSS) per worker
use Unicorn::WorkerKiller::Oom, oom_min, oom_max
end
require ::File.expand_path('../config/environment', __FILE__)
run Rails.application