The key to scale an HTTP load balancing layer is to add another layer of lower-level (IP or TCP) load balancing first. This layer can be built entirely with open-source software, although you'll get better results if you have modern routers.
The flows (TCP sessions) should be hashed using headers such as source/destination IP and TCP ports, to decide which frontend they go to. You also need a mechanism to make sure that when a frontend dies, it stops getting used.
There are various strategies, I'm going to outline a couple that I've used in production on sites serving millions of users, so you can get the idea. It would be too long to explain everything in details but I hope this answer will give you enough information/pointers to get started. In order to implement these solutions you're going to need someone who is really knowledgeable about networking.
Admittedly what I'm describing here is much harder to implement than what is described in other answers, but this is really the state-of-the-art if you have a high-trafficked website with big scalability issues and availability requirements over 99.9%. Provided you already have a network engineer kinda guy onboard, it costs less to setup and run (both in capex and opex) than load balancer appliances, and it can be scaled further at almost no additional cost (vs. buying a new, even more expensive appliance when you outgrow your current model).
First strategy: with a firewall
Presumably you have a couple routers on which your ISP uplinks are connected. Your ISP provides 2 links (active/passive, using VRRP). On your routers, you also use VRRP, and you route the traffic going to your public network to a firewall. The firewalls (FW 1
and FW 2
below) also are also active/passive and will filter the traffic and send each flow to a healthy frontend server (your HTTP load balancers, FE 1
and FE 2
below).
+--------------+ +--------------+
| ISP router A | | ISP router B |
+--------------+ +--------------+
| |
==#======================#== (public network)
| |
+---------------+ +---------------+
| Your router A | | Your router B |
+---------------+ +---------------+
| |
==#=====#==========#=====#== (RFC 1918 private network)
| | | |
+------+ +------+ +------+ +------+
| FW 1 | | FE 1 | | FE 2 | | FW 2 |
+------+ +------+ +------+ +------+
The goal is to have a flow look like this:
- ISP routes traffic going to your IPs to your active router.
- Your routers route the traffic to a VIP that uses an RFC 1918 address. This VIP is owned by the active firewall, much like VRRP. If you use OpenBSD for your firewall needs, then you can use CARP, a patent-free alternative to VRRP/HSRP.
- Your firewall applies the filter (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address").
- Your firewall also acts as a router and forwards the packets to a healthy frontend.
- Your frontend terminates the TCP connection.
Now the magic happens in steps 4 and 5, so let's see in more details what they do.
Your firewall knows the list of frontends (FE 1
and FE 2
), and it will pick one of them based on a particular aspect of the flow (e.g. by hashing the source IP and port, among other headers). But it also needs to make sure that it's forwarding traffic to a healthy frontend, otherwise you will blackhole traffic. If you use OpenBSD, for instance, you can use relayd
. What relayd
does is simple: it health-checks all your frontends (e.g. by sending them a probe HTTP request), and whenever a frontend is healthy it adds it to a table that the firewall uses to select the next hop of the packets of a given flow. If a frontend fails health checks, it is removed from the table and no packets are sent to it anymore. When forwarding a packet to a frontend, all the firewall does is swapping the destination MAC address of the packet to be that of the frontend chosen.
In step 5, the packets from the user are received by your load balancer (be it Varnish, nginx, or whatever). At this point, the packet is still destined to your public IP address so you need to alias your VIP(s) on the loopback interface. This is called DSR (Direct Server Return), because your frontends terminate the TCP connection and the firewall in between only sees simplex traffic (only incoming packets). Your router will route outgoing packets directly back to the ISP's routers. This is especially good for HTTP traffic because requests tend to be smaller than responses, sometimes significantly so. Just to be clear: this isn't an OpenBSD specific thing and is widely used in high-trafficked websites.
Gotchas:
- End users will directly connect to your frontend servers because you use DSR. Maybe it was already the case, but if it was not, you need to make sure they're adequately secured.
- If you use OpenBSD, beware that the kernel is single threaded so the performance of a single CPU core will limit the throughput of a firewall. It might be a problem depending on your type of NIC and the packet rate you're seeing. There are ways to solve this problem (more on this below).
Second strategy: without a firewall
This strategy is more efficient but harder to setup because it depends more on the specifics of the routers you have. The idea is to bypass the firewall above and have the routers do all the work the firewalls were doing.
You'll need routers that support per-port L3/L4 ACLs, BGP and ECMP, and Policy Based Routing (PBR). Only high-end routers support these features, and they often have extra licensing fees to use BGP. This is typically still cheaper than hardware load balancers, and is also far easier to scale. The good thing about these high-end routers is that they tend to be line-rate (e.g. they can always max out the link, even on 10GbE interfaces, because all the decisions they make are done in hardware by ASICs).
On the ports on which you have your ISP uplinks, apply the ACL that used to be on the firewall (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address"). Then have each one of your frontends maintain a BGP session with your router. You can use the excellent OpenBGPD (if your frontends are on OpenBSD) or Quagga. Your router will ECMP the traffic to the frontends that are healthy (because they're maintaining their BGP sessions). The router will also route the traffic out appropriately using PBR.
Refinements
- With the firewall pair solution, it's nice if you can synchronize the TCP states across the firewalls, so that when one firewall fails, everything fails over smoothly to the other one. You can achieve this with
pfsync
.
- Bear in mind that
pfsync
will typically double the packet rate on your firewalls.
- HTTP is a stateless protocol, so it's not the end of the world if you reset all the connections during a firewall failover because you don't use
pfsync
.
- If you outgrow a single firewall, you can use ECMP on your router to route your traffic to more than one pair of firewall.
- If you use more than one pair of firewall, you might as well make them all active/active. You can achieve this by having the firewalls maintain a BGP session with the routers, much like the frontends need to maintain one in the 2nd design without firewalls.
Sample relayd
config
See also HOWTO at https://calomel.org/relayd.html
vip="1.2.3.4" # Your public IP address
# (you can have more than one, but don't need to)
fe1="10.1.2.101"
fe2="10.1.2.102"
fe3="10.1.2.103"
fe4="10.1.2.104" # You can have any number of frontends.
int_if="em0"
table <fe> { $fe1 retry 2, $fe2 retry 2, $fe3 retry 2, $fe4 retry 2 }
table <fallback> { 127.0.0.1 }
redirect webtraffic {
listen on $vip port 80
session timeout 60
route to <fe> check http "/healthcheck.html" digest "(the sha1sum of healthcheck.html)" interface $int_if
}