Exhausting Linux machine TCP socket limit (~70k)?

Question

I am the founder of torservers.net, a non-profit that runs Tor exit nodes. We have a number of machines on Gbit connectivity and multiple IPs, and we seem to be hitting a limit of open TCP sockets across all those machines. We're hovering around ~70k of total TCP connections in total (~10-15k per IP), and Tor is logging "Error binding network socket: Address already in use" like crazy. Is there any solution for this? Does BSD suffer from the same problem?

We run for Tor processes, each of them listening to a different IP. Example:

# NETSTAT=`netstat -nta`
# echo "$NETSTAT" | wc -l
67741
# echo "$NETSTAT" | grep ip1 | wc -l
19886
# echo "$NETSTAT" | grep ip2 | wc -l
15014
# echo "$NETSTAT" | grep ip3 | wc -l
18686
# echo "$NETSTAT" | grep ip4 | wc -l
14109

I have applied the tweaks I could find on the internet:

# cat /etc/sysctl.conf
net.ipv4.ip_forward = 0
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.conf.default.forwarding = 0
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.send_redirects = 1
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.all.send_redirects = 0
kernel.sysrq = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.core.rmem_max = 33554432
net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.core.netdev_max_backlog = 262144
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_orphan_retries = 2
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_fin_timeout = 4
vm.min_free_kbytes = 65536
net.ipv4.netfilter.ip_conntrack_max = 196608
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
net.netfilter.nf_conntrack_checksum = 0
net.netfilter.nf_conntrack_max = 196608
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 15
net.nf_conntrack_max = 196608
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.ip_local_port_range = 1025 65535
net.core.somaxconn = 262144
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_timestamps = 0

# sysctl fs.file-max
fs.file-max = 806854

# ulimit -n
500000

# cat /etc/security/limits.conf
*       soft    nofile 500000
*       hard    nofile 500000

score 5 · Answer 1 · answered Jan 17 '12 at 22:37

5

If there are processes binding to INADDR_ANY, then some systems will try to pick ports only from the range of 49152 to 65535. That could account for your ~15k limit as the range is exactly 16384 ports.

Wikipedia: Ephemeral port

You may be able to expand that range by finding the instructions for your OS here:

The Ephemeral Port Range

answered Jan 17 '12 at 22:37

Seth Noble

356
1
5

For Linux, this would be `net.ipv4.ip_local_port_range = 1025 65535`, which is in the list of sysctl settings already applied. I can also actually see connections from lower ports across all IPs. – mo. Jan 18 '12 at 10:58
Another possibility is that the connections are not dropping cleanly, causing ports to be left in a TIME_WAIT state. How quickly are these connections cycling? In other words, how long does a typical TCP session last? – Seth Noble Jan 18 '12 at 22:17
Thanks. Yes, indeed there is a high number of relatively short-lived TCP sessions. But, `netstat -nta` inludes connections in TIME_WAIT: At the moment, 24k of 67k connections - across all interfaces! - are in TIME_WAIT. Still, I don't see why this should be a restriction? My tweaks also include a very short timeout `net.ipv4.tcp_fin_timeout = 4`. – mo. Jan 19 '12 at 09:38
Wow, that's a lot of TIME_WAITs for a 4 second timeout. I see you have net.ipv4.tcp_tw_reuse set. You could also try net.ipv4.tcp_tw_recycle. http://serverfault.com/questions/342741/what-are-the-ramifications-of-setting-tcp-tw-recycle-reuse-to-1 http://stackoverflow.com/questions/6426253/tcp-tw-reuse-vs-tcp-tw-recycle-which-to-use-or-both *However*, if you have access to the source code I suggest examining how the TCP sessions are being closed. A TIME_WAIT state usually occurs when a connection is closed while data is still in transit or not cleanly closed. – Seth Noble Jan 19 '12 at 17:09
I have already tried net.ipv4.tcp_tw_recycle over a longer time period, it did not help. The high number of TIME_WAITs is to be expected, the Tor service uses a lot of short-lived connections. I don't see why they should pose a problem, and I still don't see where the limitation comes from. The source code of Tor is available, but I am not familiar with it enough to dig into this. – mo. Jan 23 '12 at 12:41
TIME_WAIT is a factor simply because any port in that state is unavailable. If there is a spike in traffic, the number of TIME_WAITs could become so high that there are not enough free ports available for new connections. That would cause a burst of `already in use` errors. But because your connection times are short, samples with `netstat` may not capture that moment. Given all the circumstances, spreading across more nodes/addresses may be the most practical solution. – Seth Noble Jan 25 '12 at 15:41

score 1 · Answer 2 · answered Jan 17 '12 at 14:37

1

This is a limitation of TCP protocol. The port is an unsigned short int (0-65535). The solution is to use different IP addresses.

If the software can not be changed you can use virtualization. Create VMs that are bridged (not NATed) and that use public IPs so they will not be NATed later.

Check with netstat that the listeners use the IP on a interface and not all addresses (0.0.0.0):

sudo netstat -tulnp|grep '0\.0\.0\.0'

answered Jan 17 '12 at 14:37

Mircea Vutcovici

16,706
4
52
80

We already use different IP addresses. Each of the IPs has around 10-15k open connections. – mo. Jan 17 '12 at 14:40
Does `sudo netstat -tulnp|grep '0\.0\.0\.0'` return many results? Can you paste the output of: `sudo netstat -tulnp|grep '0\.0\.0\.0'|wc -l` – Mircea Vutcovici Jan 17 '12 at 14:43
`# netstat -tulnp|grep '0\.0\.0\.0' | wc -l` `56` `# netstat -tulnp| wc -l` `64` – mo. Jan 17 '12 at 14:43
This means that you do not have to many listeners, so the client endpoint sockets are the limitation. – Mircea Vutcovici Jan 17 '12 at 14:54
Can you please run: `netstat -tun|grep '0\.0\.0\.0' | wc -l` – Mircea Vutcovici Jan 17 '12 at 14:55
That one returns zero. I am not sure what you're up to, and I don't think it has anything to do with listening sockets at all. `netstat -tulnp | wc -l` returns `65` total listening sockets at the moment. We run four instances of Tor, each of them listening on a different IP. If I run `netstat -nta` and grep it for the four IPs, they sum up to a total of around ~69k, which is the total number of open sockets at the moment. Tor only uses TCP, that's why I'm talking about "TCP sockets" above. (`netstat -tun | wc -l` is `69891`) – mo. Jan 17 '12 at 15:14
I think that all those 69k sockets are using the same IP - the primary address. They should use all available IPs. – Mircea Vutcovici Jan 17 '12 at 15:29
"If I run netstat -nta and grep it for the four IPs, they *sum up* to a total of around ~69k" (and are each around 10-15k). – mo. Jan 17 '12 at 15:36
I have updated the original post to hopefully better reflect this. – mo. Jan 17 '12 at 15:44
I think this is a problem in the source code of the tor application. When you are receiving those errors, from the server try to connect outside. If you do not receive "Address already in use", then it is a bug in the tor server. – Mircea Vutcovici Jan 17 '12 at 16:18

Exhausting Linux machine TCP socket limit (~70k)?

2 Answers2