11

I'm starting a new project and considering using Ansible or Salt for deployment automation and, perhaps, more sophisticated orchestration (server management and federation).

With Salt I'm wondering if there's any integration between it and Graphite or Zenoss or Ganglia ... using the Salt 0mq connections to relay the data from the Salt "minions" to the monitoring/graphing database/collectors.

Has anyone else looked at this?

quanta
  • 50,327
  • 19
  • 152
  • 213
Jim Dennis
  • 807
  • 1
  • 10
  • 22
  • Can you explain what you're looking to do in greater detail, please? What type of interrogation do you need? – jamieb Mar 07 '13 at 07:04
  • 3
    There's a new project called [Salmon](http://lincolnloop.com/blog/2013/jun/14/introducing-salmon/) that aims to be a full-blown monitoring system using Salt as its data collection mechanism and message transport. It does use Whisper as its database, so you could conceivably integrate it into Graphite if you really wanted to. – jgoldschrafe Jun 17 '13 at 12:16

5 Answers5

9

i used salt-stack for over 6 Month now to manage 40+ nodes.

in My current setup i use:

  • Icinga as Monitoring Server
  • NRPE for executing the checks on the Nodes
  • graphite collects the data from the collectd nodes
  • collectd for collecting and pushing metrics to graphite
  • gdash for a nice Dashboard to visualize the grahite metrics
  • salt-stack and finally salt-stack to roll out the configs for NRPE / Collectd on each node

als this runs under CentOS 6.x

my expierience so far is that salt-stack is good to enroll everything. But as long term runing Daemon on the nodes, its not stable.

i have often problems with not reaching the master or memory bloating of on the salt-minions. This can be fixed with and easy workaround that you restart every 24hours/onceaweek the salt-minions.

but this problem in salt-minion makes it not usable to collect data over the 0mq Framework.

my current setup runs safe. I can enroll changes pretty quick with salt-stack and collectd on the nodes does the trick.

chifiebre
  • 141
  • 3
  • I did not _want_ to upvote this, but honesty and decency forced me to do it. They are certainly aware of the awesome possibility of providing a generalized transport for metrics. I already do some of this via salt-mine. – Dan Garthwaite Oct 23 '13 at 18:58
  • Why collectd over [py]statsd? – Dan Garthwaite Oct 23 '13 at 19:00
4

I think Salt or Ansible are not created for that task and I think they cannot be used to that purpose.

I am using Salt for several months and I didn't noticed of options of functions you want (in configs nor documentation). But I think you can "add" your requirements as Salt is written in python - if it is an option.

The easiest way is to order salt to install collectd which can collect data about the system (and has connectors to graphite)

EDIT: I found a project which implements monitoring using salt - salmon - take a look.

spinus
  • 214
  • 1
  • 4
3

You may want to take a look at Sensu, it's a monitoring solution pluggable with a lots of community plugins, including graphite among others.

However Sensu uses another messaging queue to deliver messages, RabbitMQ. Maybe some coding work is needed, but you can try to replace one of the two messaging queue, since both of them should be using the AMQ protocol to exchange messages.

Giovanni Toraldo
  • 2,557
  • 18
  • 27
2

I recommend you look into two things: Salt Mine - http://docs.saltstack.com/topics/mine/ Salt Events - http://docs.saltstack.com/topics/event/index.html

If you combine these with your own returner configuration setup to store results in graphite, or any of the others you listed. You could conceivably use Salt to handle top down 'probing' and bottom up 'eventing'. I wouldn't be able to comment on the effectiveness of such a system, but in principle there appears to be the possibility.

Techdragon
  • 121
  • 2
  • The as-yet unrealized feature of salt is that it is a secure star topology event bus. I use salt mine to run and store check_mk_agent, and a check_mk on the nagios server pulls it from the mine. – Dan Garthwaite Jul 08 '14 at 18:20
2

I outlined my journey to sub-second-per-host nagios monitoring via the salt-mine and check_mk here: http://garthwaite.org/saltmine_check_mk_agent.html

The article walks through weeks of on and off tinkering to get it all working. I'll summarize the solution:

Create a custom check_mk module for all minions:

#!/usr/bin/env python
''' Support for running check_mk_agent over salt '''
import os
import salt.utils
from salt.exceptions import SaltException

def __virtual__():
    ''' Only load the module if check_mk_agent is installed '''
    if os.path.exists('/usr/bin/check_mk_agent'):
        return 'check_mk'
    return False

def agent():
    ''' Return the output of check_mk_agent '''
    return __salt__['cmd.run']('/usr/bin/check_mk_agent')

Set minion's mine interval to one minute:

salt '*' file.append /etc/salt/minion.d/mine.conf "mine_interval: 1"

Configure the monitoring server to pull all the minion's check_mk_agent output into a single json file, then configure check_mk to query that file instead of any network queries. All accomplished with the following script on the monitoring minion:

#!/usr/bin/env python
import sys
import json
import fcntl

DATAFILE="/dev/shm/cmk.json"
NAG_UID = 105
NAG_GID = 107

def do_update():
    import os
    import salt.client

    caller = salt.client.Caller()
    data = caller.function('mine.get', '*', 'check_mk.agent')

    lockfile = open(DATAFILE+".lock", "w")
    fcntl.flock(lockfile, fcntl.LOCK_EX)

    datafile = open(DATAFILE, "w")
    datafile.write(json.dumps(data))

    for f in (DATAFILE, DATAFILE+".lock"):
        os.chmod(f, 0644)
        os.chown(f, NAG_UID, NAG_GID)

def get_agent(minion):
    lockfile = open(DATAFILE+".lock", "w")
    fcntl.flock(lockfile, fcntl.LOCK_SH)

    data = json.load(file(DATAFILE))
    return data[minion]

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print "Usage: mine_agent.py --update | <minion id>"
    elif sys.argv[1] in ['--update', '-u']:
        do_update()
    else:
        minion = sys.argv[1]
        print get_agent(minion)

Update every minute:

$ cat /etc/cron.d/retrieve_mined_minion_data
*/1 * * * * root /etc/check_mk/mine_agent.py --update

Finally: Change the datasource for all nagios targets in /etc/check_mk/main.mk:

datasource_programs = [
  ( '/etc/check_mk/mine_agent.py <HOST>', ['mine'], ALL_HOSTS ),
]
Dan Garthwaite
  • 2,922
  • 18
  • 29
  • too bad mine_interval is a global config not per mine_function, I have some heavy mine functions which may not do well if set to a minute. – jagguli Nov 20 '17 at 23:33