13

I am trying to get my head around the concept of load balancing to ensure availability and redundancy to keep users happy when things go wrong, rather than load balancing for the sake of offering blistering speed to millions of users.

We're on a budget and trying to stick to the stuff where there's plenty of knowledge available, so running Apache on Ubuntu VPS's seems like the strategy until some famous search engine acquire us (Saturday irony included, please note).

At least to me, it's a complete jungle of different solutions available. Apaches own mod_proxy & HAproxy are two that we found by a quick google search, but having zero experience of load balancing, I have no idea of what would be appropriate for our situation, or what we would look after while choosing a solution to solve our availability concerns.

What is the best option for us? What should we do to get availability high whilst staying inside our budgets?

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
Industrial
  • 1,559
  • 5
  • 24
  • 37
  • 2
    Btw, please don't implement "redundancy" by using two virtual machines running on the same server. That's just stupid. (I'm not saying that was your plan) – Earlz Mar 06 '11 at 00:03
  • perhaps using 3 or 4 dedicated IP and server (VPS) to the server in your load balance, it will cause the idea of speed, but in truth it is not. The load balance will choose what the link to access if one is down (because to much users accessing). –  Mar 06 '11 at 01:11
  • @Earlz - Nope that wasnt the plan. I wanted actually to spread out the VM's as far (geographically) as possible from eachother, so they won't even be in the same data centre – Industrial Mar 07 '11 at 17:35
  • @Fernando Costa Hi! Not sure what you mean really, do you mind writing an answer and explaining your concept a bit further? – Industrial Mar 07 '11 at 17:35
  • Bounty is ON! Looking forward to more thoughts on this – Industrial Mar 09 '11 at 08:53
  • What _is_ your budget? Is this just a static site? Do you require a database? How much downtime can you tolerate? If you have dynamic pages, what language are they using? – hobodave Mar 10 '11 at 02:44
  • Hi Hobodave. We'll run PHP and a separate MongoDB replication set for the database. We'll flip every coin to cut costs as this is a startup with little capital involved, so we're trying to get as much availability as possible for our money... – Industrial Mar 10 '11 at 08:54

8 Answers8

6

HAproxy is a good solution. The config is fairly straight forward.

You'll need another VPS instance to sit in front of at least 2 other VPS's. So for load balancing / fail over you need a minimum of 3 VPS's

A few things to think about also is:

  1. SSL termination. If you use HTTPS:// that connection should terminate at the load balancer, behind the load balancer it should pass all traffic over an unencrypted connection.

  2. File storage. If a user uploads an image where does it go? Does it just sit on one machine? You need someway to share files instantly between machines - you could use Amazon's S3 service to store all your static files, or you could have another VPS that would act as a file server, but I would recommend S3 because its redundant and insanely cheap.

  3. session info. each machine in your load balancer config needs to be able to access the session info of the user, because you never know what machine they will hit.

  4. db - do you have a separate db server? if you only have one machine right now, how will you make sure your new machine will have access to the db server - and if its a separate VPS db server, how redundant is that. It doesn't necessarily makes sense to have High Availability web front ends and a single point of failure with one db server, now you need to consider db replication and slave promotion as well.

So I've been in your shoes, thats the trouble with a website that does a few hundred hits a day to a real operation. It gets complex quick. Hope that gave you some food for thought :)

bonez
  • 161
  • 1
6

The solution I use, and can be easily implemented with VPS, is the following:

  • DNS is round-robin'ed (sp?) to 6 different valid IP addresses.
  • I have 3 load balancers with identical configuration and using corosync/pacemaker to distribute the 6 ip adresses evenly (so each machine gets 2 adresses).
  • Each of the load balancers has a nginx + varnish configuration. Nginx deal with receiving the connections and doing rewrites and some static serving, and passing it back to Varnish that does the load balancing and caching.

This arch has the following advantages, on my biased opinion:

  1. corosync/pacemaker will redistribute the ip addresses in case one of the LB fails.
  2. nginx can be used to serve SSL, certain types of files directly from the filesystem or NFS without using the cache (big videos, audio or big files).
  3. Varnish is a very good load balancer supporting weight, backend health checking, and does a outstanding job as reverse proxy.
  4. In case of more LB's being needed to handle the traffic, just add more machines to the cluster and the IP addresses will be rebalanced between all the machines. You can even do it automatically (adding and removing load balancers). That's why I use 6 ips for 3 machines, to let some space for growth.

In your case, having physically separated VPSs is a good idea, but makes the ip sharing more difficult. The objective is having a fault resistant, redundant system, and some configurations for load balancing/HA end messing it up adding a single point of failure (like a single load balancer to receive all traffic).

I also know you asked about apache, but those days we have specific tools better suited to the job (like nginx and varnish). Leave apache to run the applications on the backend and serve it using other tools (not that apache can't do good load balancing or reverse proxying, it's just a question of offloading different parts of the job to more services so each part can do well it's share).

coredump
  • 12,573
  • 2
  • 34
  • 53
  • Hi again Coredump. How many machines would be needed at a minimum to accomplish this in a real-world scenario? – Industrial Mar 11 '11 at 15:51
  • You need at least 2 VPSs to make it work at bare minimum. Both VPS can run nginx+varnish without much problem. The two VPS MUST be on different hosts, if possible with different power supplies and with network arriving from different switches, so if one side fails you still have te other. – coredump Mar 11 '11 at 16:36
  • Hi again. Thanks for the reply. I will try to read through the howtos and guides on how to setup this and try it out in a Virtual environment in my LAN and see how failover is handled. As for the moment, it appears definitely that this solution is the best for the long-run even if it will give me some grey hairs before it's working as intended... – Industrial Mar 12 '11 at 14:13
  • @industrial That's the best way to learn :) Start by assembling a load balancer with nginx+varnish, then you worry with the cluster part. – coredump Mar 12 '11 at 15:34
3

My vote is for Linux Virtual Server as the load balancer. This makes the LVS director a single point of failure as well as a bottleneck, but

  1. The bottleneck is not, in my experience, a problem; the LVS redirection step is layer-3, and extremely (computationally) cheap.
  2. The single point of failure should be dealt with by having a second director, with the two controlled by Linux HA.

Cost can be kept down by having the first director be on the same machine as the first LVS node, and the second director on the same machine as the second LVS node. Third and subsequent nodes are pure nodes, with no LVS or HA implications.

This also leaves you free to run any web server software you like, as the redirection's taking place below the application layer.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
  • Hi MadHatter. This is a solution I've never heard of before. Need to read up on it! – Industrial Mar 10 '11 at 08:57
  • Works well for me, feel free to come back with questions! – MadHatter Mar 10 '11 at 17:09
  • At my place of work we use lvs extensively for load balancing and once configured I've never seen a director ever have problems. As mad hatter says the load balancing itself is not resource intensive. We use lvs in combination with pulse and piranha to provide the failover mechanism and a web interface to edit the config. It's definitely worth a look. – Will Mar 10 '11 at 20:04
1

How about this chain?

round robin dns > haproxy on both machines > nginx to seperate static files > apache

Possibly also use ucarp or heartbeat to ensure haproxy always answers. Stunnel would sit in front of haproxy if you need SSL too

JamesRyan
  • 8,138
  • 2
  • 24
  • 36
1

You may want to consider using proper clustering software. RedHat's (or CentOS) Cluster Suite, or Oracle's ClusterWare. These can be used to setup active-passive clusters, and can be used to restart services, and fail between nodes when there are serious issues. This is essentially what you're looking for.

All of these cluster solutions are included in the respective OS licenses, so you're probably cool on cost. They do require some manner of shared storage -- either an NFS mount, or physical disk accessed by both nodes with a clustered file system. An example of the latter would be SAN disks with multiple host access allowed, formatted with OCFS2 or GFS. I believe you can use VMWare shared disks for this.

The cluster software is used to define 'services' that run on nodes all the time, or only when that node is 'active'. The nodes communicate via heartbeats, and also monitor those services. They can restart them if they notice failures, and reboot if they can't be fixed.

You would basically configure a single 'shared' IP address that traffic would be directed to. Then apache, and any other necessary services, can be defined as well, and only run on the active server. Shared disk would be used for all your web content, any uploaded files, and your apache configuration directories. (with httpd.conf, etc)

In my experience, this works incredibly well.

  • There's no need for DNS round robin, or any other single-point-of-failure load balancer -- everything hits one IP/FQDN.
  • User uploaded files go into that shared storage, and thus don't care if your machine fails over.
  • Developers upload content to that single IP/FQDN with zero additional training, and it's always up to date if it fails over.
  • The administrator can take the offline machine, patch the heck out of it, reboot, etc. Then fail the active node over. Making an upgrade take minimal downtime.
  • That now out-of-date node can be kept unpatched for a while, making a fail-back an equally easy process. (Quicker than VMWare snapshots)
  • Changes to Apache's configuration are shared, so that nothing weird happens during a failover, because an admin forgot to make changes on the offline box.


--Christopher Karel

Christopher Karel
  • 6,442
  • 1
  • 26
  • 34
1

Optimal load balancing can be very expensive and complicated. Basic load balancing should just ensure that each server is servicing roughly the same number of hits at anytime.

The simplest load-balancing method is to provide multiple A records in DNS. By default the IP address will be configured in a round robin method. This will result in users being relatively evenly distributed across the servers. This works well for stateless sites. A little more complex method is required when you have a stateful site.

To handle stateful requirements, you can use redirects. Give each web server an alternate address such as www1, www2, www3, etc. Redirect the initial www connection to the host's alternate address. You may end up with bookmark issues this way, but they should be evenly dispersed across the servers.

Alternately, using a different path to indicate which server is handling the stateful session would allow proxying sessions which have switched host to the original server. This may be a problem when the session for a failed server arrives at server that has taken over from the failed server. However, barring clustering software the state will be missing anyway. Due to browser caching, you may not experience a lot of sessions changing servers.

Failover can be handled by configuring server to take over the IP address of a failed server. This will minimize the downtime if a server fails. Without clustering software, stateful sessions will be lost if a server fails.

Without failover users will experience a delay until their browser fails over to the next IP address.

Using Restful services rather than stateful sessions should do away clustering issues on the front-end. Clustering issues on the storage side would still apply.

Even with load balancers in front of the servers, you will likely have round-robin DNS in front of them. This will ensure all your load balancers get utilized. They will add another layer to you design, with additional complexity and another point of failure. However, they can provide some security features.

The best solution will depend on the relevant requirements.

Implementing image servers to serve up content like images, CSS files, and other static content can ease the load on the application servers.

BillThor
  • 27,354
  • 3
  • 35
  • 69
1

I generally use a pair of identical OpenBSD machines:

  • Use RelayD for the load balancing, webserver monitoring, and handling of a failed webserver
  • Use CARP for high availability of the load balancers themselves.

OpenBSD is light, stable, and quite secure - Perfect for network services.

To start, I recommend a layer3 setup. It avoids complications firewall (PF) setup. Here is an example /etc/relayd.conf file that shows setup of a simple relay load balancer with monitoring of the backend webservers:

# $OpenBSD: relayd.conf,v 1.13 2008/03/03 16:58:41 reyk Exp $
#
# Macros
#

# The production internal load balanced address
intralbaddr="1.1.1.100"

# The interface on this load balancer with the alias for the intralbaddr address
intralbint="carp0"

# The list of web/app servers serving weblbaddress
intra1="1.1.1.90"
intra2="1.1.1.91"

# Global Options
#
# interval 10
timeout 1000
# prefork 5

log updates

# The "relaylb" interface group is assigned to the intralbint carp interface
# The following forces a demotion in carp if relayd stops
demote relaylb

#
# Each table will be mapped to a pf table.
#
table <intrahosts> { $intra1 $intra2 }

# Assumes local webserver that can provide a sorry page
table <fallback> { 127.0.0.1 }

#
# Relay and protocol for HTTP layer 7 loadbalancing and SSL acceleration
#
http protocol httprelay {
        return error
        header append "$REMOTE_ADDR" to "X-Forwarded-For"
        header append "$SERVER_ADDR:$SERVER_PORT" to "X-Forwarded-By"
        # header change "Connection" to "close"

        # Various TCP performance options
        tcp { nodelay, sack, socket buffer 65536, backlog 128 }

#       ssl { no sslv2, sslv3, tlsv1, ciphers HIGH }
#       ssl session cache disable
}

relay intra-httprelay {
        listen on $intralbaddr port 80
        protocol httprelay

        # Forward to hosts in the intrahosts table using a src/dst hash
        # The example shows use of a page with dynamic content to provide
        # application aware site checking.  This page should return a 200 on success,
        # including database or appserver connection, and a 500 or other on failure
        forward to <intrahosts> port http mode loadbalance \
                check http "/nlbcheck.asp" code 200

}
Paul Doom
  • 841
  • 6
  • 9
  • Hi Paul, Thanks for your hands-on example! Have you been happy with the reliability of your solution? – Industrial Mar 13 '11 at 13:16
  • Very happy. I have used OpenBSD for all sorts of network duties (firewalls, DNS servers, web servers, load balancers, etc) for about 12 years now and the consistent quality of every release has been amazing. Once it is set up, it just runs. Period. – Paul Doom Mar 13 '11 at 18:51
0

Have you given ec2 with cloudfoundry or maybe Elastic beanstalk or just a plain old AWS autoscaling a thought. I have been using that and it scales pretty well and being elastic can scaleup/down without any human intervention.

Given that you say you have zero experience with load balancing, I would suggest these options as they require minimal brain "frying" to get up and running.

It might be a better use of your time.

  • The StackOverflow family of sites used `pound` until quite recently when, I believe they implemented nginx. Note that nginx could be implemented to replace Apache, or just as a frontend to Apache. – Michael Dillon Mar 06 '11 at 05:21
  • Hi Ankur. Thanks for your reply. Amazon sure is an option which we have considered, however there's seems to be the same amount of positive as negative feedback available on the EC2's when it comes to building business critical apps on them... – Industrial Mar 07 '11 at 17:36