0

How do I setup Nginx on two machines so that if the primary machine crashes there is automatic failover to the 2nd machine? And what are the best practices here?

There seems to already be a lot written about how to do failover for backend servers that nginx routes to. This is not what I am asking about.

user782220
  • 101
  • 2

4 Answers4

1

We use Corosync and Pacemaker for an active/passive nginx cluster in our environment which runs pretty good.

Here are few key points to keep in mind, based on my recent experience.

  • Join the cluster nodes using Corosync (/etc/corosync/corosync.conf) and not Pacemaker, I had frequent problems like split brain with the latter in my environment.
  • Default option is to use pcs command to configure the cluster. However, you can use crm as well which is preferred by few people. You will have to install crmsh from Suse repositories depending on the OS. This is what I use in Red Hat based distros.

    wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo -nd -O /etc/yum.repos.d/crmsh.repo && yum install crmsh
    
  • After installation make sure to disable stonith and quorum, and set resource stickiness as well. These commands should work fine assuming you have a two node cluster.

    pcs property set stonith-enabled=false
    pcs property set no-quorum-policy=ignore
    pcs resource defaults resource-stickiness="INFINITY"
    pcs resource defaults migration-threshold="1"`
    
  • For syncing files between the active and passive node you can use tools like rysnc or unison. For block level sync (may be /etc/nginx mounted on a filesytem), you can use drbd which can be easily added as a resource in the cluster

  • Lastly, make sure to group all resources and order them using command like pcs constraint colocation and pcs constraint order

You can use Google out all the above information easily, here are few links to get you kickstarted.

  1. Active Passive Cluster to setup apache (use nginx resource instead of apache)
  2. Ensure Resources Run on the Same Host
  3. Ensure Resources Start and Stop in Order
vikas027
  • 1,149
  • 2
  • 11
  • 14
  • Your answer suggests a workable solution to the question is available via another website. The Stack Exchange family of Q&A websites generally frowns on this type of answer because other websites may move, get deleted, or changed. Please read [How do I write a good answer?](http://serverfault.com/help/how-to-answer) and consider revising your answer to include the steps required to resolve the issue. – Paul Nov 14 '15 at 22:10
  • @Paul My apologies, I did't knew this. I've put some details in the answer now. – vikas027 Nov 15 '15 at 00:43
  • Much better. Note that to get code blocks to work properly in Markdown list items, you have to put 8 spaces in front of the lines, This will be much more readable than using the `\`` character. – Paul Nov 15 '15 at 00:55
0

For simple failure, using the Heartbeat daemon to transfer the IP address to the backup machine is an excellent solution. See Cameron Miller's HOWTO for this. Many other guides are available using the Google search for "heartbeat nginx".

Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
0

take a look at my failover cluster writter in POSIX shell https://github.com/nackstein/back-to-work/ lot written about how to d you will need 3 node to have quorum (other cluster software use 2 node + shared storage or STONITH approach where each node try to kill the other but this need specialized hardware).

you can setup 3 node with ssh access that just will be used as lock servers (quorum servers). Then choose wich 2 of those will have a virtual IP and the http server running. In case of failure back-to-work will switch the virtual IP and the http server. If you need help in setting up everything contact me by mail (see the code for the address)

Luigi
  • 11
  • 3
-1

You have plenty of choices.

Cheap and effective, use DNS round robin, nearly all the big players use that (although not only for failover). Here's an example:

$ host amazon.com
amazon.com has address 176.32.98.166
amazon.com has address 205.251.242.54
amazon.com has address 176.32.103.205
amazon.com mail is handled by 5 amazon-smtp.amazon.com.
$ 

In this case it's the browser that handles failover. It's quite effective actually and all you have to do is configure your DNS entries.

Another choice not limited to HTTP / SMTP can be having hardware load balancers as for example F5's BIG-IP.

Then there is a plethora of other solutions and no space to list them all but it's easy to google for them.

Fredi
  • 2,227
  • 9
  • 13
  • https://en.wikipedia.org/wiki/Round-robin_DNS "Round robin DNS should not solely be relied upon for service availability. If a service at one of the addresses in the list fails, the DNS will continue to hand out that address and clients will still attempt to reach the inoperable service." The big players use a lot more than round robin - each of those IPs is likely to be many redundant anycasted load balancers. – ceejayoz Nov 14 '15 at 19:27
  • Sure, indeed i said that RRDNS is used not only for that, CDN / GeoIP, i knw. But it's an option wich is cheap and works 99% of the time. Indeed for the real enterprise stuff, i put an example F5. – Fredi Nov 14 '15 at 19:29
  • It doesn't work 99% of the time. If one of the two servers fails, roughly half of users will not be able to access the system because they'll be hitting the broken server's IP. – ceejayoz Nov 14 '15 at 19:30
  • @ceejayoz, talking about every computer browser it's not true. Failover happens without the user even noticing it. Been there done that. You can have problems with SOAP / Webservices when who calls you is an application – Fredi Nov 14 '15 at 19:32
  • That doesn't appear to be the case at all. ServerFault users have tested this. http://serverfault.com/questions/349964/dns-round-robin-do-browsers-stick-to-one-ip-as-long-as-it-is-online http://serverfault.com/questions/327708/how-browsers-handle-multiple-ips – ceejayoz Nov 14 '15 at 19:33
  • See the edit in that post ;-) There is a difference between a connection refused or a dropped syn. – Fredi Nov 14 '15 at 19:35
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/31564/discussion-between-fredi-and-ceejayoz). – Fredi Nov 14 '15 at 19:36