4

I'm scraping some metrics (openstack cinder volume sizes) every 15 minutes, and the results produce a discontinuous graph, like this:

enter image description here

(That's the result of the simple query cinder_volume_size_gb).

The metrics "exist" for about five minutes, but then disappear until the next scrape interval. What configuration settings would influence this behavior?

larsks
  • 41,276
  • 13
  • 117
  • 170

1 Answers1

2

To the title question - yes, it is documented at https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness:

If no sample is found (by default) 5 minutes before a sampling timestamp, no value is returned for that time series at this point in time. This effectively means that time series "disappear" from graphs at times where their latest collected sample is older than 5 minutes or after they are marked stale.

To the other question - it is poorly documented, but there is a command-line option to change the default:

 --query.lookback-delta=5m  The maximum lookback duration for retrieving metrics during expression evaluations.
  • Thanks! I did eventually find the `staleness` docs, but I hadn't come across the `query.lookback` option. It looks like it may be better to solve this on the query side (e.g. using `max_over_time(...)`). – larsks Mar 03 '20 at 12:51
  • FYI, If you have a metric in the system that had a value saved but you shut down at some point and that value never gets updated, you still end up with stale values. There's some documentation talking about TTL but for unknown reasons, they decided against it. – dtc Apr 12 '22 at 21:36