0

I've got an Elasticsearch/Logstash/Kibana instance running, which I'm merrily stuffing with syslogs from a variety of hosts.

Having built it to scale - with multiple logstash syslogd listeners, and multiple ES nodes - it's doing quite nicely for collating logging across a large portfolio of servers.

There's just one problem I'm having a the moment - grouping hosts. I can get datasets for host groupings based on a variety of criteria from my config database - physical location, 'service', 'customer' etc.

And I'd really like to be able to add these as filter criteria in my elasticsearch database, and if at all possible so I can use them in Kibana without needing to do much modification.

Currently I'm thinking in terms of either:

  • a custom logstash filter that looks up hostname in a data dump, and adds tags (really, service/customer/location is all I really need).
  • Trying to add a parent/child relationship for a 'host' document.
  • using 'percolator' to cross reference (somehow?)
  • a 'script' field?
  • Some sort of dirty hack involving a cron job to update records with metadata post-ingest.

But I'm wondering if anyone's already tackled this, and is able to suggest a sensible approach?

Sobrique
  • 3,697
  • 2
  • 14
  • 34

2 Answers2

2

Having done a bit of digging, the solution I finally decided upon was to use the logstash plugin 'filter-translate'

This takes a YAML file with key-values, and lets you rewrite your incoming log entry based on it.

So:

translate { 
    field => "logsource"
    destination => "host_group"
    dictionary_path => [ "/logstash/host_groups.dict" ]
}

This is a rather simple list:

hostname : group
hostname2 : group

At the moment, it's static-ish and rebuild and fetched via cron. I'm intending to push towards etcd and confd to do a more adaptive solution.

This means that events are already 'tagged' as they enter elasticsearch, and also because my logstash engines are distributed and autonomous, running off a 'cached' list is desirable anyway. My host lists don't change sufficiently fast that this is a problem.

Sobrique
  • 3,697
  • 2
  • 14
  • 34
1

You say you use Logstash's syslog input plugin as a target for your hosts local syslog daemons.

Assuming that each hosts's syslog daemon is also writing the log entries to files, you could use Filebeat to push those files to LS, adding the right tags at the source. Depending on the number of hosts you have, this task might be non-trivial.

Other options are, simplest to most complex:

  1. Write a whack of if... elseif ... else ... statements in your LS config to capture each host and add the appropriate tags with the mutate filter.
    This works but means changing your config each time you add/remove a host/service/customer. Having each host in it's own config file simplifies things a little, but it still means restarting LS each time.

  2. Use the elasticsearch filter to query a document in ES that has the tags you want, and add them to the events you're processing.
    The query would have to be fairly well crafted, but this might work. You'd need to create documents of the specific type, probably in a unique index, for each host so that your data is always there.

  3. Write a custom filter plugin to pull the data you need from some other source.
    A few times, I've thought about writing a Redis filter plugin to perform lookups for log sources which can't be modified and only provide numerical references to certain entities, but for which we'd like names for ease of searching. I don't know how involved this would be, but it should be doable.

GregL
  • 9,030
  • 2
  • 24
  • 35
  • I tried 2 - feeding an elasticsearch index and trying to run a query, but I was having trouble keeping up with the input rate - I couldn't figure out how you'd run a query to update another index within ES... or do you mean pull it out and stuff it back in again with a script? – Sobrique Jan 05 '16 at 20:20