How does a load balancer get around the 64k port limit?

Question

When a request comes in, it gets rerouted to one of several available servers for the request. But there's only 64k ports available so at any given time there can only be 64k outgoing requests max. So how can some websites serve millions of concurrent requests then? If someone could clear up this confusion that would be awesome. Thanks!

@TomTom: that's about incoming connections and this Q is about outgoing connections, for which the answer is quite different. — dave_thompson_085, Jun 03 '18 at 20:13
@dave_thompson_085 That's correct I'm referring to outgoing connections, specifically from the load balancer's perspective. — Jayson Tatum, Jun 03 '18 at 20:28
One possibility would be to simply bind multiple local IPs to the load balancer's network interface. — Joseph Montanaro, Jun 04 '18 at 01:18

score 2 · Answer 1 · answered Jun 03 '18 at 22:35

The already linked How do high traffic sites service more than 65535 TCP connections? and other questions on Stack Overflow explain the 5-tuple and how this (a bit less than) 64K limit is per IP - you get a connection per ephemeral port. This still applies to "load balancer" work loads, it is an IP software stack on that end too.

Say you run a service on IP 203.0.113.80 port 443. And somehow each of the 1 million IPs in 172.16.0.0/12 individually hits it. 64K ports does not matter because they already are unique by IP address. The 172.16.0.1 client could make 64K connections and the 172.16.0.2 client as well. Because these flows are different:

Proto Source IP port Target IP  port
TCP 203.0.113.80 443 172.16.0.1 44321
TCP 203.0.113.80 443 172.16.0.2 44321

In practice, the backend service and the device brokering the connections need tuning and scale-up to get to a million. You need lots of hosts and you tweak their TCP stacks to get that high.

Usually, the only time tens of thousands of connections happen between the same IPs on the same port is when using load testing utilities. Most single IPs don't have nearly enough work to set up thousands of concurrent connections.

I'm referring to outgoing connections the load balancer itself makes to backend servers, not outgoing connections made by clients. — Jayson Tatum, Jun 03 '18 at 22:45
Same thing, 64K connections between a pair of IPs given the number of ephemeral ports. If somehow the backend can sustain more than that, add more IPs to the backend. — John Mahowald, Jun 04 '18 at 12:56

score 2 · Answer 2 · answered Jun 03 '18 at 23:20

It depends on the layer 4 protocol involved.

For UDP and other connectionless transport protocols (UDP-Lite, RUDP, DCCP, and some others), it just doesn't matter. Because there's no connection, you don't have to worry about a socket being persistently associated with a particular remote host, and therefore don't have to worry about the fact that port numbers are a 16-bit integer. You just send messages to the target backend, and track where things went. Depending on the load balancing software and how it's configured, there may however be a functional limit of 65536 outstanding requests to backend servers.

For SCTP and other inherently multiplexed connection oriented transport protocols (TIPC for example, though just about nobody uses that), it really doesn't matter either. You make exactly one connection from the load balancer to each backend server, and just multiplex however many streams you need over that one connection, because the transport protocol supports doing that. The same thing can actually be done with some application layer protocols too, like HTTP/2 (but not HTTP 1.1 or earlier) or the SSH protocol.

For TCP and other single-stream connection oriented transport protocols, it gets a bit trickier. Some configurations multiplex requests over a set number of persistent connections (by keeping the connections persistent, you save time, since setting up a TCP connection is insanely slow), while others use proprietary extensions at the application layer to multiplex things over a single connection.

For any of the above, there is also the option of using multiple backend-networks, either via multiple physical networks, via VLAN's, or via some other network multiplexing technology. This approach is most common when the backend servers are virtual machines, but is still widely used in other situations too. By having a separate subnet for each backend system, your 64k port limit goes from being per-frontend to per-backend (that is, instead of your load balancer only allowing 64k active connections to all backend servers, it can handle 64k to each back-end server), which pushes the bottleneck to the number of backends and how powerful each of them are.

You do not need different subnets or VLANs or whatever to multiply the 64K. Adding an IP or a port (listen on 80 and 8080 for example) to the same backend host accomplishes the same thing, in theory. — John Mahowald, Jun 04 '18 at 13:01
```proxy_bind``` option from Nginx might works. It uses ```IP_TRANSPARENT``` socket option and iptable rules. see [IP Transparency and Direct Server Return with NGINX and NGINX Plus as Transparent Proxy](https://www.nginx.com/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/#proxy_bind) for more details. — jiafeng fu, Nov 02 '20 at 10:02

How does a load balancer get around the 64k port limit?

2 Answers2