Performance monitoring on Linux/Unix

Question

I run a few Windows servers and (Debian and Ubuntu) Linux and AIX servers.

I would like to continously monitor performance on these systems in order to easily identify bottlenecks as well as to have an overview of the general activity on the servers.

On Windows, I use Windows Performance Monitor (perfmon) for this. I set up these counters:

For bottlenecks:

Processor utilization : System\Processor Queue Length
Memory utilization : Memory\Pages Input/Sec
Disk Utilization : PhysicalDisk\Current Disk Queue Length\driveletter
Network problems: Network Interface\Output Queue Length\nic name

For general activity:

Processor utilization : Processor\% Processor Time_Total
Memory utilization : Process\Working Set_Total (or per specific process)
Memory utilization : Memory\Available MBytes
Disk Utilization : PhysicalDisk\Bytes/sec_Total (or per process)
Network Utilization : Network Interface\Bytes Total/Sec\nic name

(More information on the choice of these counters on: http://itcookbook.net/blog/windows-perfmon-top-ten-counters )

This works really well. It allows me to look in one place and identify most common bottlenecks.

So my question is, how can I do something equivalent (or just very similar) on Linux servers?

I have looked a bit on nmon (http://www.ibm.com/developerworks/aix/library/au-analyze_aix/) which is a free performance monitoring tool developed for AIX but also availble for Linux. However, I am not sure if nmon allows me to set up the above counters. Maybe it is because Linux and AIX does not allow monitoring these exact same measures. Is so, which ones should I choose and why?

If nmon is not the tool to use for this, then what do you recommend?

score 2 · Answer 1 · answered Jan 16 '12 at 09:25

2

Looking at basic system metrics does not give a good indication of performance. It can indicate how performance is constrained - but if you want to measure the performance of your applications then you really need to look at real transactions.

Regardless, there are no end of tools for measuring performance. I use nagios. It's a bit lacking in trending / capacity management but is amazingly flexibile in reporting, escalation, fault isolation and to add custom scripts (which you'll need if you want to measure your transacions). Certainly there are probes available to cover all the metrics you've listed for both MSWindows and Linux.

answered Jan 16 '12 at 09:25

symcbean

19,931
1
29
49

Maybe I did not express clearly what I meant by performance monitoring. I do not think of application performance. I want to be able to easily identify performance bottlenecks. In other words: if an application is perceived slower than usual, it is often very useful to take a look at graphs for the above counters. For example, if pages input/sec has suddenly become high, it indicates that the system hits swap all the time, thus the machine is probably having too small amount of RAM available. – ervingsb Jan 17 '12 at 09:26
I know nagios, but it seems more like a tool for monitoring that servers/services are up/reachable. I have looked a bit on the sysstat package and it seems promising, however, I cannot seem to find any information on how to measure such as "pages input/sec" or "current disk queue lenght", or "current CPU queue lenght", etc. – ervingsb Jan 17 '12 at 09:26
Nagios is just a scheduling / reporting hub to which you can add all sorts of probes in all sorts of different ways. Have a look at NRPE. – symcbean Jan 17 '12 at 15:59

score 1 · Answer 2 · answered Jan 16 '12 at 09:16

There are a number of good options, some of them F/OSS (some F/OSS with support contracts available, some full commercial, for this.

I use http://collectd.org/ with my own script (based on this) to draw pretty pictures from the resulting data in rrd files and send me the occasional email. This may not be as practical for you though (I'm only monitoring a couple of machines).

For a larger install you probably want something like Zabbix (another open source option, but considered more "enterprise grade" than collectd).

You can find a fuller list at http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems

score 1 · Answer 3 · answered Jan 16 '12 at 10:15

1

I like munin because it's easy to install and use. (apt-get install munin munin-node)

answered Jan 16 '12 at 10:15

Jure1873

3,692
1
21
28

score 1 · Answer 4 · answered Feb 06 '14 at 11:54

We are using Nagios for basic monitoring and Graphite for the performance monitoring. Graphite is a very scalable solution. In combination with the Diamond plugin you can almost measure anything without too much effort.

http://graphite.wikidot.com/
https://github.com/BrightcoveOS/Diamond

score 0 · Answer 5 · edited May 23 '18 at 00:37

In general, there are some steps that I follow as a sysAdmin to keep track of all the servers I use. System commands like top, free -m, vmstat, iostat, iotop, sar, netstat etc. Nothing comes close to these linux utility when you are analysing/debugging a problem. These commands give you a clear picture of what is going inside your server

Nagios: It tops all monitoring/alerting tools. It is very much customizable but very difficult to setup for beginners. Although there are some nagios plugins.

Server density: A cloudbased paid service that collects important Linux metrics and gives users ability to write own plugins.

New Relic, Zabbix and Munin are some other well-known services.

I have come across a similiar question earlier. You can see if the other answers help you.

Performance monitoring on Linux/Unix

5 Answers5