Monitoring of disk usage over time - check for large variations

Question

I'm looking for a tool that would monitor disk usage over time. What I'm looking for is folders or files that grow unexpectedly over a short period of time.

I use du , ncdu, baobab (when X is available), filelight and agedu to assess the situation in realtime.

Part of the problem is that when that data is absorbed by backuppc is is then "hardish" to remove it from there. And so we get bloated backups.

What I'm looking for would be an alert system with some sort of diff over du reports... on a daily or weekly basis.

Extra features : do the same with databases (postgres mainly). Notify user on a multi-user system.

I've just found gt5 http://gt5.sourceforge.net/ which claims to be a diff-capable 'du-browser'. This looks promising for checking from time to time, but doesn't seem to be easily automated. Will take a further look. — Arthur Lutz, Jan 09 '13 at 11:24
Not really an answer, but maybe some pointers in the right direction. If you plot your disk-usage into an rrd-file (for example cacti/smokeping/mrtg/whatever), there is a nagios-plugin (check-smokeping) that I use to check for latency spikes. This could be modified to detect disk-usage deviations as well I guess. — Sig-IO, Jan 24 '13 at 16:51
I run a daily script on the top few levels of my backup directories that puts `du` data into graphite. The great thing about graphite is its ability to generate deltas. Although I haven't followed up with an alert-generation script, that is my intention. — EdwardTeach, Mar 31 '13 at 04:40

score 1 · Answer 1 · answered Jul 14 '15 at 21:06

The simplest solution I found over all those years for charting of resource usage and monitoring was munin and mon. While the first excels in graphing usage over days, weeks, and years, the latter excels in very flexible monitoring, tracking failures (and recoveries), and sending out notification. Both can be easily extended using shell scripts or other programs to chart and monitor virtually any aspect of your systems.

They are somewhat old-school tools that do not provide any point-and-click-y interfaces. In other words, you must be comfortable setting things up by editing text files with vi(1) (or your favourite editor). On the other side, they are very lightweight and consume a lot less resources than ~~bloated~~ full-fledged tools like Nagios. Being simpler tools, installation and configuration can be done a lot quicker also.

One important thing to notice is that munin can be used to monitor the resources it charts, and send out notifications if values go out of configured ranges. Notification is done by running an external script, so you can plug in your own notification system (by default it just spit out emails). It is a lot less flexible that mon because (IIRC) you can setup just one notification channel for all your monitored resources. This contrasts with mon that allows you to create unlimited (resource, time-of-day, channel) tuples. But if your notification needs are not very sophisticated, then using munin may be all that you need.

Last but not least, these tools are all available on Debian systems, so just apt-get install them and you're almost all set:

$ apt-get install mon
$ apt-get install munin   # On your munin server
$ ssh monitored1.example.com apt-get install munin-node
$ ssh monitored2.example.com apt-get install munin-node

score 0 · Answer 2 · answered Aug 13 '13 at 16:05

0

Nagios can do this, send you alerts at a warning and critical point your define of disk usage for any particular disk, but it is a full-fledged infrastructure monitoring service (I'm sure you'd find use for other alerts over time).

answered Aug 13 '13 at 16:05

balleyne

145
1
6

I'd like to know more about the service and alert notifications required in the config files to notify of a large amount of change over a set period of time. We use Nagios, but I set alert thresholds like 80% usage or whatever. I've never seen a configuration that will look for an increase in usage of 10% or more over a week or something like that. – MagnaVis Jul 14 '15 at 21:35
I'm not sure if Nagios does that either, I just have an alert threshold. But I guess I kind of use logwatch to do what you're describing, but it's not automated. When I scan logwatch emails for a server for several days, I'm comparing a few things from day to day, and one of those is Disk Space usage at the bottom of a default logwatch email. Logwatch collects the percentage every day, but it's up to me to notice large jumps as I scan. So, not reporting automatically or alerting me explicitly or anything... – balleyne Jul 16 '15 at 05:31

Monitoring of disk usage over time - check for large variations

2 Answers2