We're currently stuck in a situation where we are using nginx
as our main frontend server for a set of services backed by a dynamic DNS based service discovery with Mesos-DNS.
Our nginx configuration looks a bit like this;
http {
resolver 10.10.1.1 valid=1s; // some internal DNS server
}
server {
set $core_api core-api-service.marathon.mesos; // some internal DNS
location /api {
proxy_pass $core_api:8080; // resolve core_api DNS dynamically to one of the IP's of the slave the process is running + listening on
}
}
Now the issue is that this setup works correctly, but one out of 4-5 requests always results in Nginx throwing back a 404, which makes no sense because none of the services running inside the cluster have moved to a different slave.
Now, the resolver valid=1s
is quite aggressive, so we extended it to longer periods, thinking that maybe that was querying the DNS too often. But any value there causes the same problems. Removing the valid=xx
also doesn't help.
What's going on here? How do we mitigate this?
Thanks.
EDIT (full config)
server {
listen 80;
server_name .myapp.com;
return 301 https://www.myappname.com$request_uri;
}
server {
listen 80;
gzip on;
gzip_min_length 1100;
gzip_buffers 4 32k;
gzip_types text/plain application/x-javascript text/xml text/css;
gzip_vary on;
root /usr/share/nginx/www;
index index.html index.htm;
include /etc/nginx/mime.types;
server_name api.myappname.com;
error_page 404 /static/404.html;
error_page 403 /static/404.html;
error_page 503 /static/503.html;
error_page 502 /static/502.html;
set $core_api http://core_api.marathon.mesos;
location /api {
if ($http_x_forwarded_proto != 'https') {
rewrite ^ https://$host$request_uri? permanent;
}
limit_req zone=one burst=35;
limit_req_status 503;
proxy_pass $core_api:8080;
proxy_set_header X-Real-IP $remote_addr;
}
}