0

I want to implement "floating" IP between two nodes using pcs+corosync+pacemaker. I've read dozens of howtos and ClusterLabs documentation, but it seems I've done something wrong. Help me please.

I want a following thing - floating IP and its route SRC is started on node1. If node1 loses network connectivity to node2, node1 should instantly remove floating IP and restore default route, node2 brings these things up. And vice-versa when node1 returns. Static IPs should be intact in any way.

node1 static 192.168.80.21/24

node2 static 192.168.80.22/24

floating IP 192.168.80.23/24

gateway 192.168.80.1/24

Clean Debian 10 64bit with latest updates, stock pacemaker stuff, no third-party or custom software

/etc/hosts on both nodes:

127.0.0.1       localhost.localdomain localhost
192.168.80.21   node1
192.168.80.22   node2

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

# ip r
default via 192.168.80.1 dev ens192
10.10.10.2 dev gre_node2 proto kernel scope link src 10.10.10.1 (doesn't matter this time, I think)
192.168.80.0/24 dev ens192 scope link

pcsd service is started, corosync and pacemaker are enabled in systemD but stopped at this time

both nodes are ping'able by each other, network and firewall work properly

Following is done on node1:

# # pcs status
Error: cluster is not currently running on this node


# pcs cluster destroy
Shutting down pacemaker/corosync services...
Killing any remaining services...
Removing all cluster configuration files...


# pcs host auth node1 node2
Username: hacluster
Password:
node2: Authorized
node1: Authorized


# pcs cluster setup my_cluster node1 node2 --force
No addresses specified for host 'node1', using 'node1'
No addresses specified for host 'node2', using 'node2'
Destroying cluster on hosts: 'node1', 'node2'...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'node1', 'node2'
node1: successful removal of the file 'pcsd settings'
node2: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'node1', 'node2'
node2: successful distribution of the file 'corosync authkey'
node2: successful distribution of the file 'pacemaker authkey'
node1: successful distribution of the file 'corosync authkey'
node1: successful distribution of the file 'pacemaker authkey'
Synchronizing pcsd SSL certificates on nodes 'node1', 'node2'...
node1: Success
node2: Success
Sending 'corosync.conf' to 'node1', 'node2'
node1: successful distribution of the file 'corosync.conf'
node2: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

# pcs cluster start node1 node2 - OK
node2: Starting Cluster...
node1: Starting Cluster...


# pcs property set stonith-enabled=false - OK 

# pcs property set no-quorum-policy=ignore - OK

# pcs status (on both nodes):

Cluster name: my_cluster
Stack: corosync
Current DC: node1 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Tue Mar 15 13:15:03 2022
Last change: Tue Mar 15 13:15:00 2022 by root via cibadmin on node1

2 nodes configured
0 resources configured

Online: [ node1 node2 ]

No resources


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

And now where my troubles begin - add floating IP and route SRC IP resources to cluster:

# pcs resource create virtip ocf:heartbeat:IPaddr2 ip=192.168.80.23 cidr_netmask=24 op monitor interval=30s
# pcs resource create virtsrc ocf:heartbeat:IPsrcaddr ipaddress=192.168.80.23 cidr_netmask=24 op monitor interval=30
# pcs constraint colocation add virtip with virtsrc
# pcs constraint order virtip then virtsrc
Adding virtip virtsrc (kind: Mandatory) (Options: first-action=start then-action=start)

# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node1 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Tue Mar 15 13:17:34 2022
Last change: Tue Mar 15 13:17:07 2022 by root via cibadmin on node1

2 nodes configured
2 resources configured

Online: [ node1 node2 ]

Full list of resources:

 virtip (ocf::heartbeat:IPaddr2):       Started node1
 virtsrc        (ocf::heartbeat:IPsrcaddr):     Started node1

Failed Resource Actions:
* virtsrc_start_0 on node2 'not installed' (5): call=10, status=complete, exitreason='We are not serving [192.168.80.23], hence can not make it a preferred source address',
    last-rc-change='Tue Mar 15 13:16:47 2022', queued=0ms, exec=21ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

node1:~# ip r
default via 192.168.80.1 dev ens192 src 192.168.80.23 onlink
10.10.10.2 dev gre_node2 proto kernel scope link src 10.10.10.1
192.168.80.0/24 dev ens192 scope link src 192.168.80.23

node2:~# ip r
default via 192.168.80.1 dev ens192 onlink
10.10.10.1 dev gre_node1 proto kernel scope link src 10.10.10.2
192.168.80.0/24 dev ens192 proto kernel scope link src 192.168.80.22

node2:~# ping 192.168.80.23
PING 192.168.80.23 (192.168.80.23) 56(84) bytes of data.
64 bytes from 192.168.80.23: icmp_seq=1 ttl=64 time=0.154 ms
^C

Seems OK, but now let's emulate a network failure and a havoc occurs:

node1:~#  ip link set ens192 down; sleep 60; ip link set ens192 up
root@node1:~# ip r
10.10.10.2 dev gre_node2 proto kernel scope link src 10.10.10.1
192.168.80.0/24 dev ens192 proto kernel scope link src 192.168.80.21
root@node1:~# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node1 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Tue Mar 15 13:32:56 2022
Last change: Tue Mar 15 13:17:07 2022 by root via cibadmin on node1

2 nodes configured
2 resources configured

Online: [ node1 node2 ]

Full list of resources:

 virtip (ocf::heartbeat:IPaddr2):       Started node1
 virtsrc        (ocf::heartbeat:IPsrcaddr):     FAILED node1 (blocked)

Failed Resource Actions:
* virtsrc_start_0 on node2 'not installed' (5): call=10, status=complete, exitreason='We are not serving [192.168.80.23], hence can not make it a preferred source address',
    last-rc-change='Tue Mar 15 13:16:47 2022', queued=0ms, exec=21ms
* virtsrc_stop_0 on node1 'unknown error' (1): call=15, status=complete, exitreason='no default route exists',
    last-rc-change='Tue Mar 15 13:31:26 2022', queued=0ms, exec=24ms
* virtip_monitor_30000 on node1 'unknown error' (1): call=7, status=complete, exitreason='[findif] failed',
    last-rc-change='Tue Mar 15 13:30:36 2022', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


@node2:~# ip r
10.10.10.1 dev gre_node1 proto kernel scope link src 10.10.10.2
192.168.80.0/24 dev ens192 proto kernel scope link src 192.168.80.22
root@node2:~# pcs status
Cluster name: my_cluster
Stack: corosync
Current DC: node1 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Tue Mar 15 13:34:03 2022
Last change: Tue Mar 15 13:17:07 2022 by root via cibadmin on node1

2 nodes configured
2 resources configured

Online: [ node1 node2 ]

Full list of resources:

 virtip (ocf::heartbeat:IPaddr2):       Started node1
 virtsrc        (ocf::heartbeat:IPsrcaddr):     FAILED node1 (blocked)

Failed Resource Actions:
* virtsrc_start_0 on node2 'not installed' (5): call=10, status=complete, exitreason='We are not serving [192.168.80.23], hence can not make it a preferred source address',
    last-rc-change='Tue Mar 15 13:16:47 2022', queued=0ms, exec=21ms
* virtsrc_stop_0 on node1 'unknown error' (1): call=15, status=complete, exitreason='no default route exists',
    last-rc-change='Tue Mar 15 13:31:26 2022', queued=0ms, exec=24ms
* virtip_monitor_30000 on node1 'unknown error' (1): call=7, status=complete, exitreason='[findif] failed',
    last-rc-change='Tue Mar 15 13:30:36 2022', queued=0ms, exec=0ms

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

What's wrong? How to make it work properly?

0 Answers0