0

On a K8s cluster I have Prometheus Operator and AlertManager running.

I have this alert to catch incidents when a critical pod is down:

 - alert: KubernetesContainerMission-gslNotRunning
    expr: kube_pod_status_ready{condition="false", pod=~"mission-gsl.*"} == 1 OR on() vector(0)
    for: 5m
    labels:
      severity: warning
      environment: PRODUCTION_ENV
    annotations:
      summary: CUSTOM mission-gsl pod not running for more than 5min (instance {{ $labels.instance }})
      description: "mission-gsl pod not running for more than 5min"

This deployment gets automatically rolling-restarted on a schedule every hour, and stays down for maybe 30sec during the process.

I would expect the alert to not fire, since I'm stipulating a 5min down period, yet it does.

What am I missing?

GI D
  • 1

0 Answers0