1

I'm trying to debug a caching issue with Puppet on RedHat 7. My versions are at the bottom of this question.

Below is an excerpt from my site.pp manifest. This is all fine and the Nagios check is installed on the foo.example.com node.

node 'foo.example.com' {

  nagios::service {'my_database':
    check_command => 'check_tcp_nrpe!3306',
    service_description => 'My Database',
  }

}

Now, if I add another nagios::service check in site.pp it also gets picked up by,

puppet agent --noop --test

but if I remove the same nagios::service call and run the agent again, it still sees it (these are dry runs - I don't understand why it's caching). This has been happening in many different scenarios across multiple manifests. If I remove puppetdb and run the agent, puppetdb is re-created and everything goes back to normal for a while.

Any suggestions on where to look before I go down the route of upgrading puppet, or re-installing the latest version? I'm not sure what other information to provide, so please let me know if there's something that might help.

My versions,

puppetlabs-release-7-12.noarch
puppet-server-3.8.6-1.el7.noarch
puppetdb-terminus-2.3.8-1.el7.noarch
puppet-3.8.6-1.el7.noarch
puppetdb-2.3.8-1.el7.noarch

Update 1

Below is the output from running # puppet agent --noop --test,

# puppet agent --noop --test
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for foo.example.com
Info: Applying configuration version '1522355276'
.
.
.
Notice: /Stage[main]/Nagios::Server/Nagios::Service_file[/etc/nagios/conf.d/services/foo-my_database_nagios_service.cfg]/File[/etc/nagios/conf.d/services/foo-my_database_nagios_service.cfg]/ensure: current_value absent, should be present (noop)
.
.
.
Notice: Finished catalog run in 21.10 seconds

The notice that that file should be present is bogus.

All I did was add,

nagios::service {'my_database':
    check_command => 'check_tcp_nrpe!3306',
    service_description => 'My Database',
}

run the agent, then removed it, and ran the agent again. Every time I run the agent it still thinks that check should be present even though it's not defined in any of my manifests.

Update 2

These are the steps I use to remove the cached item. After running these steps it no longer tries to add that my_database check.

cd /var/lib/puppetdb
sudo mv db db.`date +%F` # create a backup
sudo systemctl restart puppetmaster
sudo systemctl restart puppetdb
wsams
  • 121
  • 4
  • We need the output of your Puppet run. Could you throw it in Pastebin? It's probably too much to show inline. If in fact caching is involved, it will say near the top that it is using a cached catalog. – Aaron Copley Mar 29 '18 at 21:47
  • @AaronCopley I provided the head and tail of the output. The rest of the notices are similar, but if you think they'll be useful I'll try to post a de-identified full dump, but I don't think there's anything else useful in there. See "Update 1" in the description. – wsams Mar 29 '18 at 22:03
  • I didn't even think of PuppetDB. Please feel free to put your solution (Update 2 section) in an answer and mark it as accepted. This helps the Serverfault software know that this question has been solved. – Aaron Copley Apr 06 '18 at 12:39

1 Answers1

0

This issue turns out to be related to what I was doing in "Update 2". When puppetdb was deleted it lost track of all it's resources. Once puppet agent --test --noop was run on all of our servers it knew where to find the resources and everything could be found in the catalog.

Basically, once puppetdb is deleted you should run puppet agent --test --noop on all the hosts.

wsams
  • 121
  • 4