1

I have setup 5 alerts in my Prometheus setup. 3 of them work as expected. However, I have 2 that are never triggered. I am really confused and I need some help here.

So, the 2 rules that do not work are:

alert: CriticalDiskSpace
expr: node_filesystem_free{filesystem!~"^/run(/|$)",fstype!~"tmpfs",job="{{
  $labels.job }}"} / node_filesystem_size{job="{{ $labels.job }}"} <
  0.25
for: 4m
labels:
  severity: critical
annotations:
  description: '{{ $labels.instance }} of job {{ $labels.job }} has less than 25%
    space remaining.'
  summary: Instance {{ $labels.instance }} - Critical disk space usage

alert: CriticalCPULoad
expr: (100
  * (1 - avg by(instance) (irate(node_cpu{job="{{ $labels.job }}",mode="idle"}[2m]))))
  > 75
for: 2m
labels:
  severity: critical
annotations:
  description: '{{ $labels.instance }} of job {{ $labels.job }} has Critical CPU load
    for more than 2 minutes.'
  summary: Instance {{ $labels.instance }} - Critical CPU load

When I run the rules manually in the Prometheus, I get the correct values. For example, for the HDD, I have a test instance where the FS is at 79%, so, it should fire.

Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       50G   40G   11G  79% /

node_filesystem_free{filesystem!~"^/run(/|$)",fstype!~"tmpfs",fstype!~"rootfs", job="ec2_eu_west_1_discovery"} / node_filesystem_size{job="ec2_eu_west_1_discovery"} < 0.25

And of course, Prometheus has the correct value:

Element:
{device="/dev/xvda1",fstype="xfs",instance="Grafana Test",job="ec2_eu_west_1_discovery",mountpoint="/"}
Value: 
0.21932882130469517
Peter
  • 802
  • 2
  • 10
  • 23

1 Answers1

1

I have found a way to make the rule firing.

So, if I change the expression from this:

node_filesystem_free{filesystem!~"^/run(/|$)",fstype!~"tmpfs",job="{{
  $labels.job }}"} / node_filesystem_size{job="{{ $labels.job }}"} <
  0.25

to this:

node_filesystem_free{filesystem!~"^/run(/|$)",fstype!~"tmpfs"} / node_filesystem_size < 0.25

I get an alert. So, now, I need to understand why in the rules browser I can use the {job="{{ $labels.job }}"} and not in the rules.yml file.

Peter
  • 802
  • 2
  • 10
  • 23