Grafana false positive SNMP down

Question

NOTE: I also have Nagios running on another server that reports bandwidth warnings and up/down status. Not a single switch is alerting from this, only Grafana.

Grafana version 1.14.1

I was receiving alerts every minute of all switches reporting as down.

The metrics portion of the dashboard is:

up{instance="192.168.20.20",job="snmp"} <--- same for all 12 switches that are polled

I was able to log in to the switch during these reported "outages." No other services were showing interruption (e.g. servers connected to those switches). I have yet to see something like this, and I'm trying to figure out how I can troubleshoot. If there is not actually a problem, what would cause this false positive?

Grafana runs in a Docker container, and I cannot seem to find anything in /var/log/grafana/grafana.log.* related to switches.

Any ideas on where I could glean some info to debug this?

score 0 · Answer 1 · answered Jun 21 '21 at 12:04

0

Grafana is just a visualization tool. And as you can see, It is doing that job very well.

Two things :

It can be your data source's problem. Check if there's actual data is.
If you are crawling metrics by using script/daemon, then check that too.

answered Jun 21 '21 at 12:04

Eugene

26
2

It's not doing its job very well if the switch is up, but it's reporting down. I have no scripts crawling metrics. – DevOpsSauce Jun 22 '21 at 15:01

Grafana false positive SNMP down

1 Answers1