4

I've setup Graphite and statsd and both are running well. I'm using the example-client.py from graphite/examples to measure load values and it's OK.
I started doing tests with statsd and at first it seemed ok because it generated some graphs but now it doesn't look quite well.

First, this is my storage-schema.conf:

priority = 100
pattern = .*
retentions = 1m:395d

I'm using this command to send data to statsd:

echo 'ssh.invalid_users:1|c'| nc -w 1 -u localhost 8126

it executes, I click Update Graph in the Graphite web interface, it generates a line, hit again Update and the line disappears. Like this1 and this2
If I execute the previous command 5 times, the graph line will reach 2 and it will actually save it. Again running the same command two times, graph line reaches 2 and disappears.
I can't find what I have misconfigured.

The intended use is this:

tail -n 0 -f /var/log/auth.log|grep --line-buffered "Invalid user" | while read line; do echo "ssh.invalid_users:1|c" | nc -w 1 -u localhost 8126; done

EDIT:
On a fresh system I reinstalled using the latest versions of graphite, carbon, nodejs, statsd and it's acting the same.
While tail-ing /opt/graphite/storage/log/carbon-cache/carbon-cache-a/query.log I get:
cache query for "stats_counts.ssh.invalid_users" returned 0 values
cache query for "stats.ssh.invalid_users" returned 0 values whenever I press update in webapp. I noticed that it will randomly say returned 1 values when drawing the lines, but will revert to returned 0 values and the lines disappear.

w00t
  • 1,134
  • 3
  • 16
  • 35
  • I have noticed that if you copy the image-link for the graph in Graphite, and applying `?format=raw` or `?format=json`, it is a lot easier to debug why your graph is showing as it does. – pkhamre Oct 03 '12 at 12:19
  • And, are you looking at the stat from statsd, or the stat counter, which is the actual number of increments in the flush period? Take a look at my blog post about statsd and graphite to understand more - http://blog.pkhamre.com/2012/07/24/understanding-statsd-and-graphite/ – pkhamre Oct 03 '12 at 12:22
  • thanks pkhamre, already read your site. am looking at both: **stats/ssh/invalid_user** and **stats_counts/ssh/invalid_users**. Both have the same behavior. [Before](http://def.info.tm/1.png) [After](http://def.info.tm/2.png). The only different thing in the link is **salt=1349268570.215** for 1st and **salt=1349268579.416** for 2nd.. – w00t Oct 03 '12 at 13:06
  • I reinstalled using the latest versions of graphite, carbon, nodejs, statsd. Same behavior. – w00t Oct 05 '12 at 09:15

1 Answers1

5

The problem is the storage-schema retention:
retentions = 1m:395d - which is taken from graphite wiki http://graphite.wikidot.com/installation

I had to use retentions = 10:2160,60:10080,600:262974 or something similar. This takes in consideration values saved every 10 seconds.

Also, although I restarted graphite after changing storage-schema.conf, I had to use a different metric name because the previous would retain the same behavior/retention (and I can reproduce this).
So instead of echo 'ssh.invalid_users:1|c', I had to use
echo 'ssh.invalid_userstest2:1|c'.

w00t
  • 1,134
  • 3
  • 16
  • 35
  • 1
    When changing retentions you need to resize the whisper files. I have not tried this. https://github.com/graphite-project/whisper/blob/master/bin/whisper-resize.py – pkhamre Nov 12 '12 at 10:09
  • It's due to statsd flushing every 10 seconds by default. You could also change that time (config.flushInterval) if you don't want the finer resolution in your whisper files. – Darrell Mozingo Jan 11 '13 at 10:56
  • I came across the same behaviour. I'm still wondering why if the flush interval is smaller than the smallest retention period, then data gets lost. Even if the aggregation is set to `sum`... This "feels" like a bug actually. However, as you pointed out, it's easy to work around it by matching the retention with your flush interval. – Yoav Aner Jul 30 '13 at 06:25