The setup

I am collecting statistics from Varnish with Logstash, which is configured to increment statsd counters based on the vhost in the server logs and the result code. I also have carbon creating whisper archives for graphite.

I'm reading logs from varnishncsa which is configured to add vhost and request disposition to the standard logs:

VARNISHNCSA_LOG_FORMAT="%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %{Host}i %{Varnish:hitmiss}x"

My logstash shipper config looks like this:

input {
  file {
    path => "/var/log/varnish/varnishncsa.log"
    type => varnish

filter {
  grok {
    type => varnish
    pattern => "%{COMBINEDAPACHELOG} %{NOTSPACE:vhost} %{WORD:varnish_handling}"
    pattern => "%{COMBINEDAPACHELOG}"

  mutate {
    rename => [ 'response', 'status' ]

output {
  statsd {
    type => varnish
    host => "my-statsd-host"
    port => 8125
    sender => "%{@fields.vhost}"
    increment => "varnish.response.%{@fields.status}"
    increment => "varnish.handling.%{@fields.varnish_handling}"

The problem

Hundreds of distinct counters are being created by carbon due to variations in the domain entered into users' browsers. So, for example, I have

www_mywebsite_net    <-- an alias

Obviously these are then missed by my graphs, which only look at statistics under the vhost's canonical name.

What I'd like is for some canonicalising process to happen beforehand. I can write a script to take a 'raw' domain and spit out a 'real' vhost name, but I'm not sure how to integrate that. Do I put it in the logstash config, or in statsd, or carbon? Could I do something with carbon's storage aggregation feature?

Update: I've worked around the worst cases by running carbon's aggregator daemon in front of the cache, and adding rules to rewrite-rules.conf. However, there's very little documentation for that file, and I can't do more powerful things like smash everything down to lowercase.

you can lowercase a field with the mutate filter:

filter {
  mutate {
    lowercase => [ "fieldname" ]

Logstash 1.1.13 Docs

Jan M.

Jan M.
  • Thanks for that -- that takes care of instances where the case is the only problem. However I'd also like to normalise aliases of vhosts (see third line in my example). – Flup Jul 05 '13 at 12:58