Why can I not exceed 32k or 65k TIME-WAIT connections?

Question

I've been trying to tune up our Ubuntu 14.04 LTS web server instances, hosting both web applications and reverse-proxying nginx, to handle as many req/s as possible with the given hardware. It's a c4.2xl EC2 instance with 8x vCPU.

I'm running the following two benchmark tools against it from my office machine (NOT both at the same time):

wrk -c1000 -d2m -t8 --timeout 90 --latency http://api.mysite.com/2/ping
# or
ab -k -n 100000 -c 1000 http://api.mysite.com/2/ping

What I'm seeing is that by running ss -tan | wc -l I always max out at about 65.5k connections in TIME-WAIT

My OS setup is:

net.ipv4.ip_local_port_range value="15000 65000"
/etc/security/limits.conf has `www-data hard nofile 100000' in it
/etc/pam.d/common-session* are updated to read the above

And the nginx setup is:

worker_processes auto; # will result in 8 on this machine

events { worker_connections 8192; multi_accept on; use epoll; }

Upstream to the api being proxied to nginx is below, used to get a very high maximum of different TCP quadruplets, meaning I pretty much never run out of ephemeral ports in nginx -> app:

upstream my_api { server 127.0.0.1:3004; server 127.0.0.2:3004; server 127.0.0.3:3004; [...] }

I experience a similar issue with my m3.large instance where instead of 65k I max out at 32k. The difference between the two instances is that the former has 2vCPU, the latter has 8, and the former has 7.5GB memory and the latter has 15GB.

A similar problem has been described in this post (Scaling beyond 65k open files (TCP connections)) but it doesn't seem to apply in my case, as on my smaller instance the vm.max_map_count is 65530, but it never goes past 32k connections in TIME-WAIT.

I thought that at first the limit was just # processes * # workers, but on the smaller instance I'm still capped at 32k even if I raise the # of workers per process to 25k each, so that's not it.

I'm not sure what knob to tweak at this point, not clear to me where these hard constraints could be coming from. Could use some help here.

Interestingly enough, I don't see connections being ultimately refused from either of these machines as TIME-WAIT reaches this "limit". It's possible the socket queues are filling up behind the scenes and the client just re-attempts to establish a connection later again, which is why I'm not seeing any permanent failures.

Update:

On a c4.8xlarge instance I can get up to 262k connections in TIME-WAIT with the same exact deployment configurations. Even limiting # of nginx workers to just 1 doesn't change it. Still not sure what the difference would be here.

Update 2:

I strongly suspect this has to do with the different instances all having different net.ipv4.tcp_max_tw_buckets values which from what I can tell match exactly the pattern I'm seeing.

Have you tried testing *from* multiple machines (or interfaces) simultaneously? Meaning, could the test machine be running out of ports? — h0tw1r3, Oct 27 '15 at 02:08
I indeed have. On the smaller instance I always max out at 32769 even when running `wrk` against the server concurrently from two completely different ISPs with their respective IPs. — Alexandr Kurilin, Oct 27 '15 at 02:18
If they're both EC2 instances, with similar network setup (and same kernel config), I would contact AWS support. Stinks like a VPS issue to me. — h0tw1r3, Oct 27 '15 at 02:39
After looking at the various instances and configs, the AWS folks claim there's nothing on their end that would limit the # of connections. — Alexandr Kurilin, Oct 28 '15 at 17:53
Why is your upstream on a local loopback address? They should be different hosts. — hookenz, Oct 29 '15 at 18:28
BTW, to proxy you use two connections. One between the client and the server and one from the server to the upstream. — hookenz, Oct 29 '15 at 18:30
Reason for that is that at this point nginx is running on the same machine as the applications themselves. — Alexandr Kurilin, Oct 29 '15 at 18:36
What's the observed behaviour when you hit the limit? Do you get TCP resets from the server or just time-out? — Pedro Perez, Oct 30 '15 at 07:33

score 1 · Answer 1 · edited Apr 13 '17 at 12:13

1

Have a look at net.ipv4.netfilter.ip_conntrack_max tunable. For more information, you can read this ServerFault post

edited Apr 13 '17 at 12:13

Community

1

answered Oct 29 '15 at 18:13

shodanshok

44,038
6
98
162

Interestingly enough this doesn't seem to be exposed by default in stock Ubuntu 14.04 installations. Seems like this would fix it: http://serverfault.com/questions/338479/error-net-ipv4-netfilter-ip-conntrack-max-is-an-unknown-key – Alexandr Kurilin Oct 29 '15 at 18:38

score 0 · Answer 2 · answered Oct 29 '15 at 18:16

0

You are running out of source ports at your source machine.

In order to identify a connection you need: Source IP, Source Port, Destination IP and Destination Port. As Source IP, Destination IP and Destination Port are always the same in your tests, you have only one variable: Source Port. Your TCP/IP stack can't handle more than 64k different source ports (actually a bit less).

Stress-testing from a single point is never a good idea, but you might be able to squeeze this a bit more by enabling net.ipv4.tcp_tw_recycle to reuse ports in TIME_WAIT status, but it may cause you trouble due to the aggressive port reuse.

answered Oct 29 '15 at 18:16

Pedro Perez

5,652
1
10
11

1

1. Just to clarify, I've done the benchmark tests from multiple ISPs concurrently AND during production hours when the server is being hit by a few thousand different IPs per second, the results are always exactly the same 2. recycle won't work for us as practically all of our users are behind their respective NATs – Alexandr Kurilin Oct 29 '15 at 18:19
1

Ah! I stand corrected :) – Pedro Perez Oct 30 '15 at 07:32

Why can I not exceed 32k or 65k TIME-WAIT connections?

2 Answers2