1

The setup

I am collecting statistics from Varnish with Logstash, which is configured to increment statsd counters based on the vhost in the server logs and the result code. I also have carbon creating whisper archives for graphite.

I'm reading logs from varnishncsa which is configured to add vhost and request disposition to the standard logs:

VARNISHNCSA_LOG_FORMAT="%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\" %{Host}i %{Varnish:hitmiss}x"

My logstash shipper config looks like this:

input {
  file {
    path => "/var/log/varnish/varnishncsa.log"
    type => varnish
  }
}

filter {
  grok {
    type => varnish
    pattern => "%{COMBINEDAPACHELOG} %{NOTSPACE:vhost} %{WORD:varnish_handling}"
    pattern => "%{COMBINEDAPACHELOG}"
  }

  mutate {
    rename => [ 'response', 'status' ]
  }
}

output {
  statsd {
    type => varnish
    host => "my-statsd-host"
    port => 8125
    sender => "%{@fields.vhost}"
    increment => "varnish.response.%{@fields.status}"
    increment => "varnish.handling.%{@fields.varnish_handling}"
  }
}

The problem

Hundreds of distinct counters are being created by carbon due to variations in the domain entered into users' browsers. So, for example, I have

www_mywebsite_com
WWW_MyWebsite_Com
www_mywebsite_net    <-- an alias
...etc...

Obviously these are then missed by my graphs, which only look at statistics under the vhost's canonical name.

What I'd like is for some canonicalising process to happen beforehand. I can write a script to take a 'raw' domain and spit out a 'real' vhost name, but I'm not sure how to integrate that. Do I put it in the logstash config, or in statsd, or carbon? Could I do something with carbon's storage aggregation feature?

Update: I've worked around the worst cases by running carbon's aggregator daemon in front of the cache, and adding rules to rewrite-rules.conf. However, there's very little documentation for that file, and I can't do more powerful things like smash everything down to lowercase.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Flup
  • 7,688
  • 1
  • 31
  • 43

1 Answers1

2

you can lowercase a field with the mutate filter:

filter {
  mutate {
    lowercase => [ "fieldname" ]
  }
}

Logstash 1.1.13 Docs

Cheers, Jan

Jan M.
  • 86
  • 3
  • Thanks for that -- that takes care of instances where the case is the only problem. However I'd also like to normalise aliases of vhosts (see third line in my example). – Flup Jul 05 '13 at 12:58