Heartbeat node that was kicked out doesn't rejoin virtual IP service

Question

We have a 2-node heartbeat cluster that servers a virtual IP. Previous due to an error, the network interface for node1 died and resulted in the cluster kicking node1 from the virtual IP party.

Now that we have fixed it, node1 no longer gets to rejoin the virtual IP party. Setting node2 to standby does not trigger failover to node1.

I am unfamiliar with heartbeat. Is there a configuration/command anywhere that allows me to reverse/configure/un-blacklist this?

score 0 · Accepted Answer · answered Jun 17 '19 at 10:21

After some digging, it turns out that the failcount has hit its limit during the network interface debacle. Hence, the resource refuses to migrate back to the working node. I could view the failcount for each resource with :

pcs status failcount show <resource_id> [node]

source :

$ pcs resource help

To solve it, I ran this :

crm_resource --cleanup

that cleared up all the failcounts for my resources. (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-failure-handling.html) Now, the failover works and everything is fine now.

you are not using a heartbeat cluster, but pacemaker – c4f4t0r Jun 17 '19 at 10:28 — c4f4t0r, Jun 17 '19 at 10:28

Heartbeat node that was kicked out doesn't rejoin virtual IP service

1 Answers1