34

I often see web app architectures with a SLB / reverse-proxy in front of a bunch of app servers.

What happens when the number of connections to the SLB requires too many resources for a single SLB to handle effectively? For a concrete yet over-the-top example, consider 2 million persistent HTTP connections. Clearly a single SLB cannot handle this.

What is the recommended configuration for scaling out a SLB?

Is it typical to create a group/cluster of LBs? If so, how is client load spread out between the group of LBs?

z8000
  • 752
  • 1
  • 6
  • 15
  • z8000, Can you say what software load-balancer you are using? Also, if possible what algorithm/protocol it uses for load-balancing. – Martin May 11 '11 at 14:27
  • I don't have a preference. I updated the question to be clearer. – z8000 May 11 '11 at 14:31
  • It's not clear to me why a load balancer intrinsically cannot handle 2 million persistent HTTP connections. – womble Jul 09 '11 at 07:31

6 Answers6

28

OK, there is already an accepted answer, but there is something to add.. The most common 'classical' ways of scaling the load balancer tier are (in no particular order):

  • DNS Round Robin to publicize multiple IP addresses for the domain. For each IP address, implement a highly available server pair (2 servers cooperating on keeping one IP address working at all times.) Each IP corresponds to one load balancer cluster, either using appliances or servers with load balancing software. Scale horizontally by adding more load balancer pairs as needed.

  • Routing or firewall tweaks to spread load to multiple load balancers. Have the front router or front firewall spread the incoming connections to several IP addresses (each representing one load balancer pair) by hashing the source IP address, having multiple equal-cost routes to the load balancers, or similar.

  • A layer of IP level load balancers in front of a layer of HTTP level load balancers. IP-layer load balancing can be implemented in ASICs / silicon, and can be wicked fast for some things. Thus a single IP load balancer pair can often 'keep up' with a several HTTP/HTTPS level load balancers, and provide multi-gigabit performance levels while keeping the architecture nice and simple.

Going completely in-depth on the different ways doing the above would require a very long answer. But in general, it is not that hard to scale the load balancer tier, it is far harder to scale the application server tier and especially the database tier.

Whether you choose an appliance form factor (F5, Cisco, A10) or a generic server (Windows / Linux + software) matters less. The major considerations when scaling out the load balancer layer are:

  • State-full versus stateless. Do you absolutely need sticky sessions, or can you live without? Not keeping state makes everything simpler.
  • 'Hardware' (ASICs) versus 'software' (general purpose servers) for load balancing. Each has its pros and cons, see the HAProxy overview documentation linked above.
  • L3/4 (IP / TCP/IP) load balancing versus L7 (HTTP) load balancing. Again, pros and cons, the HAProxy doc provides a good overview.
  • SSL termination, where, on the webnodes or on the load balancer.

Generally, you don't need to worry about this before your website gets very large -- a single modern server with fx nginx will handle tens of thousands of plain HTTP requests per second. So don't do premature optimization, don't deal with this before you have to.

  • You don't actually *need* each IP adress to be highly available using DNS RR. Browsers will, in general, fall back to another IP if available when they cannot connect. But if you have public web services, you will need HA for each IP address, as many web services libraries will not handle fail-over to other IPs automatically. – rmalayter Jul 08 '11 at 16:43
18

Load balancers can't easily be scaled by other load balancers since there will inherently be a single load balancer on the chain somewhere maintaining the connections. That said, balancers such as LVS or HAProxy have absurd capacity in the Gbps range. Once you get beyond the capabilities of a single load balancer (software, hardware, whatever), then you'll need to move into other techniques such as round robin DNS.

Hyppy
  • 15,458
  • 1
  • 37
  • 59
  • Right! Having the single LB is the "problem". I agree that throughput wouldn't be a problem generally. But I am concerned about other resources such as RAM, which in my case is limited. There's only so many connections that can be hosted on a single SLB before RAM runs out. – z8000 May 11 '11 at 14:49
  • HAProxy can handle about 20k-60k active sessions per GB of RAM. I believe LVS can do much more, since the retained session data is smaller. If you run out of RAM, either upgrade it or build another load balancer fronted by a round-robin DNS system. – Hyppy May 11 '11 at 14:58
  • 1
    "Load balancers can't easily be scaled by other load balancers" -- actually, a single ASIC based L4 load balancer can often be placed in front of a couple of L7 HTTP based load balancers with excellent results. The same basic principle applies for software-only implementations, for example Linux LVS in front of nignx. –  May 12 '11 at 15:06
9

The key to scale an HTTP load balancing layer is to add another layer of lower-level (IP or TCP) load balancing first. This layer can be built entirely with open-source software, although you'll get better results if you have modern routers.

The flows (TCP sessions) should be hashed using headers such as source/destination IP and TCP ports, to decide which frontend they go to. You also need a mechanism to make sure that when a frontend dies, it stops getting used.

There are various strategies, I'm going to outline a couple that I've used in production on sites serving millions of users, so you can get the idea. It would be too long to explain everything in details but I hope this answer will give you enough information/pointers to get started. In order to implement these solutions you're going to need someone who is really knowledgeable about networking.

Admittedly what I'm describing here is much harder to implement than what is described in other answers, but this is really the state-of-the-art if you have a high-trafficked website with big scalability issues and availability requirements over 99.9%. Provided you already have a network engineer kinda guy onboard, it costs less to setup and run (both in capex and opex) than load balancer appliances, and it can be scaled further at almost no additional cost (vs. buying a new, even more expensive appliance when you outgrow your current model).

First strategy: with a firewall

Presumably you have a couple routers on which your ISP uplinks are connected. Your ISP provides 2 links (active/passive, using VRRP). On your routers, you also use VRRP, and you route the traffic going to your public network to a firewall. The firewalls (FW 1 and FW 2 below) also are also active/passive and will filter the traffic and send each flow to a healthy frontend server (your HTTP load balancers, FE 1 and FE 2 below).

      +--------------+       +--------------+
      | ISP router A |       | ISP router B |
      +--------------+       +--------------+
             |                      |
           ==#======================#==   (public network)
             |                      |
      +---------------+     +---------------+
      | Your router A |     | Your router B |
      +---------------+     +---------------+
             |                      |
           ==#=====#==========#=====#==   (RFC 1918 private network)
             |     |          |     |
       +------+ +------+  +------+ +------+
       | FW 1 | | FE 1 |  | FE 2 | | FW 2 |
       +------+ +------+  +------+ +------+

The goal is to have a flow look like this:

  1. ISP routes traffic going to your IPs to your active router.
  2. Your routers route the traffic to a VIP that uses an RFC 1918 address. This VIP is owned by the active firewall, much like VRRP. If you use OpenBSD for your firewall needs, then you can use CARP, a patent-free alternative to VRRP/HSRP.
  3. Your firewall applies the filter (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address").
  4. Your firewall also acts as a router and forwards the packets to a healthy frontend.
  5. Your frontend terminates the TCP connection.

Now the magic happens in steps 4 and 5, so let's see in more details what they do.

Your firewall knows the list of frontends (FE 1 and FE 2), and it will pick one of them based on a particular aspect of the flow (e.g. by hashing the source IP and port, among other headers). But it also needs to make sure that it's forwarding traffic to a healthy frontend, otherwise you will blackhole traffic. If you use OpenBSD, for instance, you can use relayd. What relayd does is simple: it health-checks all your frontends (e.g. by sending them a probe HTTP request), and whenever a frontend is healthy it adds it to a table that the firewall uses to select the next hop of the packets of a given flow. If a frontend fails health checks, it is removed from the table and no packets are sent to it anymore. When forwarding a packet to a frontend, all the firewall does is swapping the destination MAC address of the packet to be that of the frontend chosen.

In step 5, the packets from the user are received by your load balancer (be it Varnish, nginx, or whatever). At this point, the packet is still destined to your public IP address so you need to alias your VIP(s) on the loopback interface. This is called DSR (Direct Server Return), because your frontends terminate the TCP connection and the firewall in between only sees simplex traffic (only incoming packets). Your router will route outgoing packets directly back to the ISP's routers. This is especially good for HTTP traffic because requests tend to be smaller than responses, sometimes significantly so. Just to be clear: this isn't an OpenBSD specific thing and is widely used in high-trafficked websites.

Gotchas:

  • End users will directly connect to your frontend servers because you use DSR. Maybe it was already the case, but if it was not, you need to make sure they're adequately secured.
  • If you use OpenBSD, beware that the kernel is single threaded so the performance of a single CPU core will limit the throughput of a firewall. It might be a problem depending on your type of NIC and the packet rate you're seeing. There are ways to solve this problem (more on this below).

Second strategy: without a firewall

This strategy is more efficient but harder to setup because it depends more on the specifics of the routers you have. The idea is to bypass the firewall above and have the routers do all the work the firewalls were doing.

You'll need routers that support per-port L3/L4 ACLs, BGP and ECMP, and Policy Based Routing (PBR). Only high-end routers support these features, and they often have extra licensing fees to use BGP. This is typically still cheaper than hardware load balancers, and is also far easier to scale. The good thing about these high-end routers is that they tend to be line-rate (e.g. they can always max out the link, even on 10GbE interfaces, because all the decisions they make are done in hardware by ASICs).

On the ports on which you have your ISP uplinks, apply the ACL that used to be on the firewall (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address"). Then have each one of your frontends maintain a BGP session with your router. You can use the excellent OpenBGPD (if your frontends are on OpenBSD) or Quagga. Your router will ECMP the traffic to the frontends that are healthy (because they're maintaining their BGP sessions). The router will also route the traffic out appropriately using PBR.

Refinements

  • With the firewall pair solution, it's nice if you can synchronize the TCP states across the firewalls, so that when one firewall fails, everything fails over smoothly to the other one. You can achieve this with pfsync.
    • Bear in mind that pfsync will typically double the packet rate on your firewalls.
    • HTTP is a stateless protocol, so it's not the end of the world if you reset all the connections during a firewall failover because you don't use pfsync.
  • If you outgrow a single firewall, you can use ECMP on your router to route your traffic to more than one pair of firewall.
  • If you use more than one pair of firewall, you might as well make them all active/active. You can achieve this by having the firewalls maintain a BGP session with the routers, much like the frontends need to maintain one in the 2nd design without firewalls.

Sample relayd config

See also HOWTO at https://calomel.org/relayd.html

vip="1.2.3.4"  # Your public IP address
               # (you can have more than one, but don't need to)
fe1="10.1.2.101"
fe2="10.1.2.102"
fe3="10.1.2.103"
fe4="10.1.2.104"  # You can have any number of frontends.
int_if="em0"
table <fe> { $fe1 retry 2, $fe2 retry 2, $fe3 retry 2, $fe4 retry 2 }
table <fallback> { 127.0.0.1 }

redirect webtraffic {
        listen on $vip port 80
        session timeout 60
        route to <fe> check http "/healthcheck.html" digest "(the sha1sum of healthcheck.html)" interface $int_if
}
tsuna
  • 1,613
  • 1
  • 15
  • 10
2

Personally I go to simpler, less configurable hardware load balancers at that point - things like Cisco's ACE/ASAs, Foundry ServerIrons, maybe even Zeus ZXTMs (a SW LB that's designed fro very heavy loads).

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • In other words scale _up_? Such a LB will still be maxed out at some number of connections (etc.). What then? That's really my question. Thanks! – z8000 May 11 '11 at 14:34
  • 1
    Really big sites just use lots of heavy duty LBs running under some form of DNS round-robin - it's good enough for the moment for most and can handle hundred of millions of connections. That said there is the question of why so many connections need to remain open of course... – Chopper3 May 11 '11 at 14:37
  • Is that _internal_ RRDNS you mean? Neat, I didn't think of that. Re: open connections... I'm exploring options for an app that requires sending updates to connected clients over time as events occur somewhere on the backend. I am torn between a custom TCP server or lots of open HTTP connections behind a SLB. Thank you for your comments. – z8000 May 11 '11 at 14:41
  • I would think it would have to be external RRDNS. For instance, Twitter.com would use RRDNS to resolve & distribute requests to one of many large LBs which would then distribute the load to the servers. – Robert May 11 '11 at 14:58
  • Yes Robert, you're right, for instance we use Cisco GSS boxes to do site-by-site RR. – Chopper3 May 11 '11 at 15:11
1

Perhaps instead of constantly keeping so many open connections to send replies, code your application in such a way so that clients will periodically poll your servers as often as necessary?

Is whatever you are doing actually requires a response this very millisecond or can a client wait 15/20 seconds until the next polling period?

Mxx
  • 2,312
  • 2
  • 26
  • 40
0

A typical approach would be to create a cluster large enough to handle the required load and to use a SLB which can do deterministic load-balancing (for the case of persistent connections).

Something like CARP uses a hash of the requesting IP to determine which backend web server would handle the request, this should be deterministic but not very useful if there is a firewall or NAT in front of your load balancer.
You may also find something like IPVS useful if you are running on Linux.

Martin
  • 481
  • 2
  • 5
  • What you claim about carp is so far from how it works, I wouldn't know where to begin! +-0 for mentioning IPVS. – 3molo May 12 '11 at 11:02
  • @3molo...huh? see net.inet.carp.arpbalance at http://www.linux.com/archive/feed/35482 "..CARP source-hashes the originating IP of a request. The hash is then used to select a virtual host from the available pool to handle the request." – Paul Jul 08 '11 at 20:35