1

I've been using the DataStax OpsCenter (Enterprise) 5.0.0 REST API with DSE 4.5.1 (Cassandra 2.0.9) to query the /$cluster_name/nodes endpoint and see the last_seen field for the number of seconds since each given node was seen to see if I can detect down nodes via DataStax OpsCenter's API.

For nodes that are are up, last_seen is 0. When shutting down the Cassandra process last_seen begins incrementing, and resets to 0 when the process comes back up.

However, I've noticed that if DataStax OpsCenter is down when a node goes down, after it comes back up last_seen remains 0, meaning you can't tell if a node is up or if it was down and never seen.

The same problem occurs if DataStax OpsCenter Agent goes down. If the Cassandra process goes down afterwards last_seen also remains 0.

I've tested this on my older 3.2.2 OpsCenter (Community) with DSC Cassandra 2.0.1 and it seems to still detect the Cassandra process is down and start incrementing last_seen even when the DataStax OpsCenter Agent is shut down first.

Now for robustness I actually use nodetool's view of Cassandra node availability, but surely the DataStax OpsCenter method should be more robust in differentiating between up and down nodes/processes?

Is this a bug in DataStax OpsCenter?

Is there another way of determining if a node is up/down via the DataStax OpsCenter REST API?

ps. I've written a lot of Cassandra and DataStax OpsCenter Nagios Plugins which is why I came across this, find them on my github https://github.com/harisekhon/nagios-plugins.

Andrew Schulman
  • 8,561
  • 21
  • 31
  • 47
  • Apologies for the super delayed response here, but I've created an internal ticket to fix the last_seen issue. Fyr it's OPSC-3776. Thanks for posting about the nagios plugins.. I plan to take a look soon – mbulman Oct 27 '14 at 00:33

1 Answers1

0

From a purely empirical observation, the API ".../nodes/<node_ip>/mode" will return null if the node is unavailable, otherwise the string "normal"(or presumably some other appropriate string). Interpret the JSON from "nodes/all/mode" for all node statuses together.

Note that I have seen a bit of lag (a few secondds) between when the "mode" API changes and when OpsCenter shows the node status change, but that might just be my environment; YMMV.