5

I would like to check with my nagios monitoring if every node use current catalog version provided by puppetmaster.

In my situation, there are:

  • puppetmaster
  • host1
  • host2
  • hostX

I would like to create nrpe plugin on host1, host2, hostX to:

  • Check current catalog version on host
  • Check current catalog version prepared for node on puppetmaster
  • Warning, if both differs

Problems:

  1. To check catalog version on puppetmaster I can run /production/catalog API query, but it is very expensive (cpu) - because catalog need to be compiled every time I ask for it.
  2. I can't see any options to check current catalog version on node. I tried puppet catalog, but it was not very helpful.

So my question is, how to monitor puppet agents health, and be notified if any host use old puppet catalog? Does it make any sense?

Tomasz Olszewski
  • 868
  • 9
  • 20
  • 1
    Why not simply monitor whether the nodes are properly checking in? I believe the catalog "version" number reported by the server changes between requests, regardless of whether the catalog has actually changed. – Shane Madden Jan 05 '13 at 20:09
  • Catalog version is constant between requests. It changes only when catalog changes (You can check it with --test --noop). Your option is good, but I would like to check it on node side, not on puppetmaster. – Tomasz Olszewski Jan 10 '13 at 13:40

3 Answers3

6

I wrote a simple check_puppet NRPE script that does most of what you want. It's based on RI Pienaar's original which was more than I needed. In both cases we parse /var/lib/puppet/state/last_run_summary.yaml to see the state of the last agent run.

I don't see the advantage of using a third piece of software to compare the catalog versions between the master and agent when a normal agent run should provide enough data to alert properly.

Ramin
  • 336
  • 1
  • 5
  • Thanks for your reply. Main problem with /var/lib/puppet/state/last_run_summary.yaml is that it changes even when puppet agent is run with --test and --noop. But I believe that it's the only possible method to check catalog version. – Tomasz Olszewski Jan 10 '13 at 13:33
  • There are so many useful informations in last_run_summary.yaml, that I think comparing catalogs with puppetmaster is not necessary now. – Tomasz Olszewski Jan 10 '13 at 14:29
  • I suppose this script should be upgraded a little bit for checking cases when puppet was stopped manually. – ipeacocks Jul 11 '16 at 10:36
1

Here's what we do:

In our setting we have a wrapper script around calling puppet agent --test that also checks for some environment settings like existence of a "stopper file" that allows logged-in admins to disable automation temporarily.

In the wrapper script we touch a state file (/var/state/puppet-run) everytime that the puppet agent exits with status code 0.

We then track the age of this file to determine whether it's older than e.g 1.5 times the time between puppet runs.

Theuni
  • 938
  • 5
  • 14
  • Your solution is great when puppet agent is not used in daemon mode. I think that it's better than agent in some situations, but I really like ability to kick agent from puppetmaster, and that's why I prefer daemon mode. – Tomasz Olszewski Jan 10 '13 at 13:37
0

This exact problem inspired me to build this as a service so people could just move on and not have to build their own monitoring tool. Check out http://cronitor.io -- one monitor is free and there are paid plans for business.

Encoderer
  • 267
  • 1
  • 2
  • 11