3

I have set up Sensu with the API and Server running on one instance, and RabbitMQ running on a separate instance. This is working quite well for us; however, if the server or api loose connectivity to RabbitMQ, the Sensu Server is not sending any notifications. I would expect the server to send out No keep-alive sent from client in over 120 seconds notifications for each client in this scenario. As it stands now with our setup, if RabbitMQ fails (or the connection to it fails), all monitoring will silently fail.

How can Sensu be configured to send notifications when the Server or API processes loose connectivity to the transport (RabbitMQ)? In general, what are the best practices for monitoring the monitoring software?

Brian
  • 33
  • 2

1 Answers1

1

I have a similar setup, with Sensu Server, API, and Uchiwa on one cluster tier, a cluster of RabbitMQ nodes, and a master/slave setup of Redis.

My understanding is that all client messages go onto the queue for processing. If the queue is unavailable, the server process cannot reach the queue to see that it's client process cannot reach the queue.

The way that I've solved it (which makes sense for the properties of my company and environment) is by having multiple Sensu clusters, one for each environment, and each cluster watches key availability points of the other Redis cluster, typically by hitting the opposite cluster's components Load-balancer endpoints.

Another way that you might solve this is by installing a tiny RabbitMQ instance on your Sensu Server instance that the Server process knows about and the Sensu Server's Sensu Client communicates with. (This would depend on Sensu Server being able to watch multiple queues.)

I have been happy with the setup that we have, as it provides us with a reasonable assurance that our monitoring system is at least as available as the things that it's watching. If you have the capacity to spin up multiple clusters, I would absolutely encourage that. (I recommend this regardless of the monitoring product used.) If not, but you have engineering time, I would suggest investigating whether the additional local RabbitMQ is possible.

gWaldo
  • 11,887
  • 8
  • 41
  • 68