As far as I known, L4 load balancer maintains 2 TCP connections:
- One is from front side to Load balancer
- LB terminate above connection, create new TCP connection , change IP/Port of TCP packet to forward to backend.
In HTTP2/gGPRC, client-server will maintains a single long live connection. If we use L4, this connection will be the first one which is mentioned above.
In some articles, I read that although there are multiple deployed backend servers, once one client makes first request to one backend, this pair client-backend will be kept for all successive requests. That means other backends are unused.
Here is one of articles: https://blog.bugsnag.com/envoy/
gRPC uses the performance boosted HTTP/2 protocol. One of the many ways HTTP/2 achieves lower latency than its predecessor is by leveraging a single long-lived TCP connection and to multiplex request/responses across it. This causes a problem for layer 4 (L4) load balancers as they operate at too low a level to be able to make routing decisions based on the type of traffic received. As such, an L4 load balancer, attempting to load balance HTTP/2 traffic, will open a single TCP connection and route all successive traffic to that same long-lived connection, in effect cancelling out the load balancing.
I am really unclearly this point. Anybody could please explains more details? Many appreciate! Thanks