Questions tagged [system-monitoring]

Questions regarding system monitoring - Nagios, Icinga, Spiceworks, Munin, Zabbix and more.

158 questions
41
votes
4 answers

Find out which process is changing a file

I'm trying to find a reliable way of finding which process on my machine is changing a configuration file (/etc/hosts to be specific). I know I can use lsof /etc/hosts to find out what processes currently have the file open, but this doesn't help…
robbles
  • 513
  • 1
  • 4
  • 5
22
votes
6 answers

How to find out the number of time series stored in Prometheus LevelDB

i'm responsible for maintaining the Prometheus servers in our company. The metrics however are provided by the teams. Is there a way to find out the number of time series stored in the Prometheus datadase? We are using the default LevelDB data…
Tobias Wiesenthal
  • 363
  • 1
  • 2
  • 7
20
votes
9 answers

script to automatically test if a web site is available

I'm a lone web developer with my own Centos VPS hosting a few small web sites for my clients. Today I discovered my httpd service had stopped (for no apparent reason - but that's another thread). I restarted it but now I need to find a way that I…
Xoundboy
  • 593
  • 1
  • 9
  • 20
13
votes
3 answers

Alternative to etsy/statsd

Is there any alternative to etsy's statsd? Maybe even a complete dashboard-like solution? My research only found proprietary SaaS solutions. For those who do not know: statsd is a deamon which collects app and system metrics via UDP and sends them…
d135-1r43
  • 411
  • 4
  • 13
12
votes
2 answers

16TB Volumes and SNMP On Windows

As volumes larger than 16TB became more common, it was recognized that the 32 bit value used to report disk size and usage within the standard "HOST-RESOURCES" MIB in SNMP was not large enough to report the proper disk size. Net-SNMP seems to have…
Univ426
  • 2,139
  • 14
  • 26
11
votes
4 answers

How can you distinguish between a crash and a reboot on RHEL7?

Is there a way to determine whether a RHEL7 server was rebooted via systemctl (or reboot / shutdown aliases), or whether the server crashed? Pre-systemd this was fairly easy to determine with last -x runlevel, but with RHEL7 it's not so clear.
kwb
  • 173
  • 1
  • 8
10
votes
1 answer

Best way to monitor Windows server?

I'm working at a company that provides our small business clients with IT support. One of my tasks is to perform service checks which includes checking the event viewer for critical errors/warnings as well as DHCP and DNS management consoles. The…
10
votes
4 answers

Monitoring Dell/HP Servers Running ESXi (Free)

What are you all doing to monitor ESXi servers that run the free edition? With the lack of SNMP support, it seems fairly limited to me. What'd I'd like to be able to do is get some type of alert when a drive or other hardware fails. I've seen a few…
9
votes
2 answers

How to monitor power supply status using ipmitool on Linux/Solaris?

ipmitool differs a lot in Solaris and Linux. How can I use ipmitool in these servers (on Sun, IBM and other hardwares) to detect the power supply status?
vrnjain
  • 91
  • 1
  • 1
  • 3
7
votes
4 answers

SNMP service security tab is missing - Windows Server 2012 R2 - DC

I have to configure the security settings for the SNMP-Service on a Windows Server. But they are missing! Here are the facts: OS: Windows Server 2012 R2 I installed the SNMP feature and I believe, that I already configured the service (but I forgot…
7
votes
2 answers

Agentless monitoring: how does it work? Advantages over traditional monitoring?

How does agentless monitoring work? From what I understood (or not), it seems this is accomplished by logging into the node-being-monitored from a central server and uploading-then-running scripts on it? What are the major differences between…
sysadmin04
  • 71
  • 2
7
votes
2 answers

Load average is greater than the number of EC2 Compute Units

On an EC2 m1.large, with an AVG CPU Utilization graph such as this: how is is possible that the load average is greater than the number of EC2 Compute Units (4) ? cat /proc/loadavg 5.78 5.57 5.44 1/188 9388
Drew
  • 205
  • 2
  • 7
7
votes
2 answers

Green-IT: How do you deal with poweroff systems in your system monitoring?

Many of you probably have completed or are contemplating Green-IT projects with the goal to power off idle or unneeded systems when demand for computer resources is low: How you did handle this situation in your system monitoring? I'm especially…
knweiss
  • 3,955
  • 23
  • 20
7
votes
1 answer

Intermittent munin-cron error “There is nothing to do here, since there are no nodes with any plugins”

We've installed munin monitoring on one of our servers. Generally it seems to be working well but occasionally, 4 times in 2 months to be exact, munin-cron has generated the following error: [FATAL] There is nothing to do here, since there are no…
scarba05
  • 333
  • 6
  • 15
6
votes
1 answer

Nagios OK notification at the beginning of the availability period

I'm monitoring an application which starts just before business hours and shuts down at the end of the day using Nagios 4.3. I've configured the notification period for it to start 3 minutes after the application is slated to launch. I would like…
Isac Casapu
  • 235
  • 1
  • 10
1
2 3
10 11