8

We utilize heartbeat for High Availability. I'd like to add an additional ip address to the heartbeat cluster, but I don't want to do a full restart of the cluster in the process. Is there a signal I can send to heartbeat that would prompt it to re-parse the "haresources" file and action upon it? heartbeat -r does not appear to do the trick.

Peter Grace
  • 3,446
  • 1
  • 26
  • 42

3 Answers3

6

The problem was that I didn't wait long enough after executing "heartbeat -r" (the command that is executed in the init.d script when you run "service heartbeat reload".) After a few minutes, the IP showed up on the interface as expected.

Peter Grace
  • 3,446
  • 1
  • 26
  • 42
  • Heartbeat applies the change itself eh? That actually has a very low suck quotient! If you find out how long it takes let us know :-) – voretaq7 Feb 17 '12 at 19:23
  • I realized after reading this comment that it was rather misleading; I nuked the entire answer and re-wrote it. – Peter Grace Feb 17 '12 at 20:43
  • mmh, that is more sensible -- You have to trigger the reload, but it isn't instant. (And it's more deterministic, which makes me happy.) – voretaq7 Feb 17 '12 at 20:44
2

You don't need to reload Heartbeat at all. Simply add the new IPaddr resource to your haresources file, something like this

IPaddr::xx.xx.xx.xx

and then start it

/etc/ha.d/resource.d/IPaddr xx.xx.xx.xx start

Of course, you should make sure to issue the IPaddr start on the active node. You should now be able to send and receive traffic on the just added IP address.

Kendall
  • 1,043
  • 12
  • 24
  • I'm going to hold off on accepting my own answer as correct, since even though what I did worked, your suggestion sounds considerably more elegant. I want to try it out, but if it works, upvote and accepted answer shall be yours. – Peter Grace Feb 20 '12 at 14:20
  • OK, here's the deal. I tried this and low and behold, it worked! The problem is that doing this without reloading heartbeat would leave the cluster in an inconsistent state. I checked the source, and there's only three places where heartbeat reparses the haresources file, and all three of those conditions are during a requested restart. As such, if a cluster were to failover and failback, the ip you put in haresources, and manually instantiated with IPaddr start, would not be recreated in the failover. Feel free to prove me wrong, but it appears this method is dangerous to rely upon. – Peter Grace Feb 20 '12 at 17:28
  • Quite right, Heartbeat does not keep the configuration files (eg haresources) in sync for you --you have to devise your own method. In my environment, we typically use unison for this, and it seems to work well. The haresources file is not cached, and is thus read anew when it needs to be read. Any entries in haresources _will_ be started on restart events (or events that cause haresources to be read); this includes failover. – Kendall Feb 20 '12 at 17:39
0

Hearbeat only has to be restarted on the secondary machine, hence avoiding any downtime related to resource management.

In this case, the primary node detects that the slave machine is 'dead' and forces a 'failover' which reloads the resources file and start the missing resources.

The logs are quite explicit when doing this:

May  9 12:10:40 gw2 heartbeat: [3684]: info: Received shutdown notice from 'gw1'.
May  9 12:10:40 gw2 heartbeat: [3684]: info: Resources being acquired from gw1.
May  9 12:10:40 gw2 heartbeat: [26469]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
May  9 12:10:40 gw2 harc[26469]: info: Running /etc/ha.d//rc.d/status status
May  9 12:10:40 gw2 mach_down[26521]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
May  9 12:10:40 gw2 mach_down[26521]: info: mach_down takeover complete for node gw1.
May  9 12:10:40 gw2 heartbeat: [3684]: info: mach_down takeover complete.
May  9 12:10:40 gw2 heartbeat: [3684]: debug: StartNextRemoteRscReq(): child count 1
May  9 12:10:40 gw2 IPaddr2[26520]: INFO:  Running OK
May  9 12:10:40 gw2 IPaddr2[26640]: INFO:  Running OK
May  9 12:10:40 gw2 IPaddr2[26725]: INFO:  Running OK
May  9 12:10:40 gw2 IPaddr2[26805]: INFO:  Running OK
May  9 12:10:40 gw2 IPaddr2[26890]: INFO:  Resource is stopped
May  9 12:10:40 gw2 heartbeat: [26470]: info: Local Resource acquisition completed.
May  9 12:10:40 gw2 heartbeat: [3684]: debug: StartNextRemoteRscReq(): child count 1
May  9 12:10:40 gw2 heartbeat: [26953]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
May  9 12:10:40 gw2 harc[26953]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
May  9 12:10:40 gw2 ip-request-resp[26953]: received ip-request-resp IPaddr2::1.2.3.4 OK yes
May  9 12:10:40 gw2 ResourceManager[26976]: info: Acquiring resource group: gw2 IPaddr2::1.2.3.4
May  9 12:10:40 gw2 IPaddr2[27006]: INFO:  Resource is stopped
May  9 12:10:40 gw2 ResourceManager[26976]: info: Running /etc/ha.d/resource.d/IPaddr2 1.2.3.4 start
May  9 12:10:40 gw2 IPaddr2[27115]: INFO: ip -f inet addr add 1.2.3.4/24 brd 1.2.3.255 dev brwan
May  9 12:10:40 gw2 IPaddr2[27115]: INFO: ip link set brwan up
May  9 12:10:40 gw2 IPaddr2[27115]: INFO: /usr/lib/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-1.2.3.4 brwan 1.2.3.4 auto not_used not_used
May  9 12:10:40 gw2 IPaddr2[27091]: INFO:  Success

May  9 12:10:47 gw2 heartbeat: [3684]: WARN: node gw1: is dead
May  9 12:10:47 gw2 heartbeat: [3684]: info: Dead node gw1 gave up resources.
May  9 12:10:47 gw2 heartbeat: [3684]: info: Link gw1:eth0 dead.

May  9 12:10:59 gw2 heartbeat: [3684]: info: Heartbeat restart on node gw1
May  9 12:10:59 gw2 heartbeat: [3684]: info: Link gw1:eth0 up.
May  9 12:10:59 gw2 heartbeat: [3684]: info: Status update for node gw1: status init
May  9 12:10:59 gw2 heartbeat: [3684]: info: Status update for node gw1: status up
May  9 12:10:59 gw2 heartbeat: [3684]: debug: StartNextRemoteRscReq(): child count 1
May  9 12:10:59 gw2 heartbeat: [28604]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
May  9 12:10:59 gw2 heartbeat: [3684]: debug: get_delnodelist: delnodelist= 
May  9 12:10:59 gw2 harc[28604]: info: Running /etc/ha.d//rc.d/status status
May  9 12:10:59 gw2 heartbeat: [3684]: info: Status update for node gw1: status active
May  9 12:10:59 gw2 heartbeat: [3684]: debug: StartNextRemoteRscReq(): child count 1
May  9 12:10:59 gw2 heartbeat: [28619]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
May  9 12:10:59 gw2 harc[28619]: info: Running /etc/ha.d//rc.d/status status
May  9 12:10:59 gw2 heartbeat: [28634]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
May  9 12:10:59 gw2 harc[28634]: info: Running /etc/ha.d//rc.d/status status
May  9 12:11:00 gw2 heartbeat: [3684]: info: remote resource transition completed.