I have two HA load balancers (hollywood
and wolfman
) running Corosync and Pacemaker. The eth1
interfaces are connected to the WAN, and the eth0
interfaces to the LAN, using a virtual IP as the gateway for the back end servers. The eth1
IP of hollywood
is xxx.xxx.195.45
, and the eth1
IP of wolfman
is xxx.xxx.195.46
. The bindnetaddr
in Corosync is xxx.xxx.195.32
, the same as the WAN's network address, and the Corosync port is the default 5405
.
The relevant IP tables rules on both servers are:
*filter
--flush
:INPUT DROP
--append INPUT --protocol udp --destination-port 5404 --jump ACCEPT
--append INPUT --protocol udp --destination-port 5405 --jump ACCEPT
This setup seems to work fine, but initially I added --in-interface eth1
and --source xxx.xxx.195.46
to wolfman
, and --source xxx.xxx.195.45
to hollywood
. Most of the time this seemed to work, but rebooting the passive balancer sometimes killed communication between the load balancers, writing these errors to syslog:
[TOTEM ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
So it seems that either my simplistic belief that all the Corosync traffic is directly between the two load balancers over eth1
is wrong, or that something else is causing a problem.
I'd like to lock port 5404/5405
down in IPTables to just the cluster. What do I need to do to make this happen?
Edit: corosync.conf
as requested. This is all default Ubuntu other than the bindnetaddr
.
# Please read the openais.conf.5 manual page
totem {
version: 2
# How long before declaring a token lost (ms)
token: 3000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membership protocol (ms)
join: 60
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: xxx.xxx.195.32
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}