0

How do you monitor if pacemaker is still working? If all nodes are online and not in a state of standby or even offline/down?

Monitoring the services isn't the problem, this can be done directly. But im still not sure if I should monitor the status of the crm and if so, how to do it.

Comradin
  • 321
  • 3
  • 11
  • There's some curses-based management command. I'd check to see what options are available on that command, if it'll just return with an exit code, etc., or at least parseable text. I assume you want to see nodes that are online/idle/whatever. – cjc May 07 '12 at 10:22
  • http://exchange.nagios.org/directory/Plugins/Clustering-and-High-2DAvailability/Check-CRM/details or write a Nagios plugin to parse `crm_mon -1` results. – quanta Jun 30 '12 at 05:12

1 Answers1

0

By default, if the crm has a hissy-fit you'll know about it because the machine reboots. We run a Nagios check at work that does all sorts of checks for Pacemaker config in general (Make sure is-managed-default isn't false, that no resources have a non-zero failcount, all that kind of thing) -- I don't know where we got it from, but presumably it's floating around the 'tubes somewhere.

womble
  • 95,029
  • 29
  • 173
  • 228
  • Our service provider runs mysql master-master nodes, with basic Heartbeat failover. We're on the mail list for Heartbeat messages, but we've also set up a Nagios check that looks at the MAC for the HA IP and the MAC for the standby master's IP. If they match, then we've missed an email and the IP floated to the the standby. – cjc May 07 '12 at 10:24
  • To be blunt, if you care which machine on a cluster a service is running, ur doin it rong. – womble May 07 '12 at 10:39
  • We were running long-running queries on the secondary master. It was a long-ago cost-saving decision to do that, instead of a proper reporting slave. – cjc May 07 '12 at 10:42
  • You'd be better off adding a couple of lines of code to whatever runs the reporting queries to detect the failure and try again. – womble May 07 '12 at 10:47
  • @womble, Im not caring with the services. Im just interested in the information if pacemaker thinks all nodes are still fine. Like one node being standby, offline, or worst case a split-brain happend. – Comradin May 07 '12 at 14:03
  • So monitor that, then -- it's not hard to get that information out of Pacemaker. – womble May 08 '12 at 10:27
  • you can use crm_mon for send email, http://beekhof.net/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch04.htm – c4f4t0r Oct 03 '14 at 14:53