Promethius, group_left, and "on" vs "ignoring"

Question

In Issue #2204, one of the Prometheus developers says:

...in principle you should be favouring ignoring over on to produce generic shareable rules...

I'm confused how the use of ignoring would lead to more generic rules. For example, consider a situation where we have one "info" metric for a device and several statistics, like this:

device_info{id="1", owner="coyote", project="acme"}
device_rx_bytes{id="1"}
device_tx_bytes{id="1"}
device_rx_errors{id="1"}
device_tx_errors{id="1"}

If I want to get the receive rate by project, I would need to correlate the device_rx_bytes metric with the corresponding device_info metric. To me this smells like a SQL join, and I would write:

rate(device_rx_bytes[5m]) * on(id) group_left(project) device_info

This seems "generic" in the sense that it only makes assumptions about the label used for the grouping (id) and the label we want to propagate to our results (project). If I understand the ignoring operator correctly, the corresponding expression would be more complex because I would need to list out all labels from the right-hand side that don't exist in the left-hand side. Something like:

rate(device_rx_bytes[5m]) * ignoring(owner, project) group_left(project) device_info

Is that correct? And if it is, why is ignoring preferred over on (not just in the quote above, but in various documentation and examples as well)?

score 1 · Accepted Answer · answered Sep 05 '20 at 21:27

I think the keyword in that comment is shareable or in other words reusable rules. Meaning you (often) preserve more labels while using ignoring compared to on and the results will be (usually) a rule with more of it's original labels left intact, so it can be reused for more scenarios.

Imagine these time series:

instance_cpu_time_ns{app="lion", proc="web", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="elephant", proc="worker", rev="34d0f99", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="turtle", proc="api", rev="4d3a513", env="prod", job="cluster-manager"}
instance_cpu_time_ns{app="fox", proc="widget", rev="4d3a513", env="prod", job="cluster-manager"}
...

A query with ignoring(rev) leaves out all the other labels in the result, compared with the same query with on(app).

But the result of on and ignoring would be identical if you use them with mutually exclusive set of labels, like the example you are mentioning.

Promethius, group_left, and "on" vs "ignoring"

1 Answers1