I am looking for a way to diagnose issues, such as swap death, where a balooning memory process fills up swap and kills the whole machine (such as apache).
I'm already using cacti and I can set up nagios (though would rather not) or munin but as far as I can tell they can't record individual program usage - just overall status.
I know I can roll a script that >> to some file every 30s but I'd like to see if an existing mature solution already exists.
Again, ideally it would:
- record processes' memory usage every N seconds
- record processes' CPU usage every N seconds
- support charts and history
- support averages - like mysqld has used 43% CPU in the last day and averaged 400MB memory
- be free and open source
Process names are not and should not be known in advance - the idea is to just let it monitor and then have a look at the top offenders.
My system is Linux (OpenSUSE).