2

We have IIS servers in Azure behind load balancer. Load balancer has unconfigurable timeout of 4 minutes after which inactive connection is killed.

We're trying to setup nginx as a reverse proxy to the IIS cluster described above. Everything works fine except nginx doesn't send keep alive messages to the server it opened connection to. So, if server takes more than 4 minutes to reply the connection is killed by the load balancer.

If a client (browser) connects to the load balancer directly, it sends TCP keep alive messages and all is good. If client connects to nginx, it sends keep alive to nginx and all is good with connection to nginx. But no keep alive messages between nginx and load balancer, so eventually connection dies.

so_keepalive option seems to be related to the TCP keep alive for connections opened to nginx from client (browser).

We tried other reverse proxies (IIS ARR proxy, haproxy) and always ran into the same issue.

We can't configure load balancer (outside of our control). How could we configure either linux with nginx, nginx, or IIS behind load balancer (to which nginx connects) to get them to send keep alive messages to keep connection open?

Sumrak
  • 195
  • 1
  • 3
  • 10
  • Sounds like instead of a reverse proxy / web server (Nginx) you need to use a message queue. There's likely a fundamental assumption behind web servers that longer than about 30 seconds and something has gone wrong and the connection should be abandoned. – Tim May 24 '16 at 02:56
  • @Tim There are websockets, which kind of break that assumption by being long-lived, and run over HTTP/HTTPS. – Michael Hampton May 24 '16 at 03:43
  • @MichaelHampton If you're using websockets behind a device with a timeout, the end that implements the device with the timeout can also ensure that websocket-level pings are used. (For example, by setting the [ping interval](https://www.iis.net/configreference/system.webserver/websocket#004).) – David Schwartz May 24 '16 at 19:34

2 Answers2

1

Both in Linux and Windows the program which opens connection must set keep alive option on the socket for keep alive packets to be sent.

We just updated nginx code to enable keep alive for all opened sockets. File to update: src/event/ngx_event_connect.c

Code to enable keep alive (tested only on Linux):

    /* Set the option active */
    int tcp_keepalive = 1;
    if(setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (const void *) &tcp_keepalive, sizeof(int)) < 0) {
        return NGX_ERROR;
    }

Put it in ngx_event_connect_peer right after socket is created and connection is retrieved (ngx_get_connection).

You will then need to also decrease keep alive time (from default 2 hours) and maybe keep alive interval. See http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ for more details.

Sumrak
  • 195
  • 1
  • 3
  • 10
  • 1
    Are you certain you can't use application-level keepalives rather than keepalives at TCP level? You almost always can, and it's a superior solution. – David Schwartz May 24 '16 at 19:34
  • I'm not sure what application-level keepalives are. HTTP keep alives will not work, they don't send anything through connection to indicate to load balancer that connection is used. Client (browser) already sends keep alive. It's just nginx, which is in the middle, when opens connection to the web server does not send keep alives. Web server is ASP.NET and there's no easy access to the underlying request socket to enable keep alive there. – Sumrak May 24 '16 at 21:16
  • An application-level keep alive is any data you can send over the TCP connection that will cause the proxy to keep the connection alive. For example, if you're going to close the connection anyway, you can just send one zero byte every minute. If you're keeping the connection open, you can dribble out a dummy HTTP request byte-by-byte while you wait for a reply, then quickly finish the dummy HTTP request. If you're using websocket, it has explicit support for protocol-level pings. TCP keepalives should be an absolute last resort if you really have no other choice. – David Schwartz May 24 '16 at 21:37
  • We don't use websocket. But, why is application-level keep alive better? It adds a complexity level to the application to deal with infrastructure setup. – Sumrak May 24 '16 at 21:51
  • There are a variety of reasons. The primary reason is that it's better to add completely portable complexity to the application than to demand an unusual system-level configuration that affects everything. A key secondary reason is that using TCP keepalives this way is not guaranteed to work (see, for example, [RFC 1122 section 4.2.3.6](http://www.freesoft.org/CIE/RFC/1122/114.htm)). TCP keepalives exist as a kludge to work around applications and protocols that weren't designed to work with TCP. – David Schwartz May 24 '16 at 22:16
1

For anyone searches here, you can try http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_socket_keepalive

Syntax:     proxy_socket_keepalive on | off;
Default:    

proxy_socket_keepalive off;

Context:    http, server, location

This directive appeared in version 1.15.6.

Configures the “TCP keepalive” behavior for outgoing connections to a proxied server. By default, the operating system’s settings are in effect for the socket. If the directive is set to the value “on”, the SO_KEEPALIVE socket option is turned on for the socket. 
zhen fang
  • 11
  • 1