We have a cascade of two NginX reverse proxies in front of a Java web server.
The first proxy runs on FreeBSD (11.1-RELEASE-p10) load balancers and proxies all internet traffic into the internal network. There are two such load balancers. They have an identical config:
location / {
proxy_pass http://app_servers;
proxy_set_header X-Request-ID $request_id;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
The second proxy runs on CentOS application servers and proxies requests to different applications on the same host. There are two app servers in question, also with identical config:
location / {
proxy_pass http://java_app;
proxy_redirect off;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $http_x_real_ip;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
So, the pipeline looks like this:
LB - AS - Java
Internet ---| X
LB - AS - Java
The Java web server has a request path that returns HTTP 1.1 chunked responses. Each response takes about 2 ms:
HTTP/1.1 200 OK
Connection: keep-alive
Transfer-Encoding: chunked
Content-Type: application/json;charset=UTF-8
Date: Tue, 06 Nov 2018 10:51:08 GMT
42
{..........JSON..........}
0
The second proxy (on app server) returns them in about 2 ms as well. No problem there.
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 07 Nov 2018 10:34:29 GMT
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
42
{..........JSON..........}
0
But the first proxy (on load balancer) returns them to the client in about 102 ms each. It introduces a consistent 100 ms delay. This is the problem.
The same Java server has another request path that returns normal (non-chunked) responses with Content-Length
headers. These responses are returned in 2 ms by both proxies without problems. They go through the exact same locations in NginX.
It shouldn't be a network problem, because the problem is observed on both application servers and on both gateways.
This makes me think that the chunked encoding is somehow causing the 100 ms delay. But I don't understand why and I don't know how to fix it.
Any clues will be appreciated.