6

We have a number of filesystems for our computational cluster, with a lot of users that store a lot of really large files. We'd like to monitor the filesystem and help optimize their usage of it, as well as plan for expansion.

In order to this, we need some way to monitor how these filesystems are used. Essentially I'd like to know all sorts of statistics about the files:

  • Age
  • Frequency of access
  • Last accessed times
  • Types
  • Sizes

Ideally this information would be available in aggregate form for any directory so that we could monitor it based on project or user.

Short of writing something up myself in Python, I haven't been able to find any tools capable of performing these duties. Any recommendations?

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Kamil Kisiel
  • 11,946
  • 7
  • 46
  • 68
  • What's the server operating system? "ls --time=atime -lR" anyone? – Evan Anderson Jun 09 '09 at 17:53
  • The operating system is Linux. Sure, I can do ls, but I don't see how running that on a 12TB filesystem is going to give me a concise view of its usage characteristics without me writing a lot of code to analyze the output from multiple runs. – Kamil Kisiel Jun 09 '09 at 18:05
  • I said "ls..." as a bit of a joke. Having said that, though, I don't know of a tool that does what you want "out of the box". I'd probably write something to grovel thru an arbitrary path and child directories, outputting data suitable to insert into an RDBMS. I'd report on the data in the RDBMS. I'd be interested to see others' thoughts on this one. – Evan Anderson Jun 09 '09 at 18:08
  • To add: The tool you're looking for is either going to have to integrate fairly deeply into the filesystem (in order to get "frequency of access") or it's going to have to do something like an ls -lR repeatedly to gather statistics. There's no magic way to get that info without gathering it. – Evan Anderson Jun 09 '09 at 18:10

3 Answers3

1

You probably want something that will log file system events with inotify. Maybe something like logsend. Also see inotify-tools.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
  • 1
    I don't think this is going to do what the post wants. Inotify stores pathnames of watched files in kernel memory, so the number of watches is limited. I'm guessing there are more files in the poster's 12TB of data than can be handled with inotify watches. There's a nice USENIX paper about findings from filesystem monitoring at the link below. The methodology to gather statistics was to use instrumentation on NetApp filers. I think the poster is going to need to do something similar and capture data at the network protocol "export" level, rather than by groveling the filesystem repeatedly. – Evan Anderson Jun 09 '09 at 18:54
  • http://www.usenix.org/events/usenix08/tech/full_papers/leung/leung_html/index.html Also: http://stackoverflow.com/questions/535768/what-is-a-reasonable-amount-of-inotify-watches-with-linux – Evan Anderson Jun 09 '09 at 18:55
  • +1 for teaching me that inotify keeps the information in kernel memory :-) – Kyle Brandt Jun 09 '09 at 19:22
  • I think Evan pretty much hit the nail on the head with the link to that paper. That's basically the exact kind of data I'd like to be able to collect. Just trying to see if there's a way short of purchasing an expensive NetApp box to do it :) – Kamil Kisiel Jun 09 '09 at 21:56
  • How are you exporting this data to clients? NFS? CIFS/SMB? (I could imagine, with Samba for example, running in a higher-than-normal log level and writing a script to parse the log files. Might be a fun little thing to write, actually...) – Evan Anderson Jun 09 '09 at 23:26
  • I think inotify-tools is the appropriate solution. Just search for "inotifywatch example 1" on http://inotify-tools.sourceforge.net/ – Martin M. Jun 10 '09 at 02:29
1

Wow. Novell has something a lot like this for their Open Enterprise Server on NSS volumes that gives most of that. Not frequency of access, that's proxied by last-access-date, but definitely the rest of it. It's a Java processes that crawls the volume tree to build what they call an Inventory. I'm pretty sure it isn't open sourced, but that report is rather nice to have.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
-1

I'd go the python way. os.walk() is easy to use, and all the info you need for each file is in the stat().

Javier
  • 9,078
  • 2
  • 23
  • 24
  • python os.walk() will be way to expensive. Just create a directory with 15.000 subdiretories, each of those has 5.000 subdirectories and each of those has 100 files. "time create_entries.py" == "real 2m48.919s" -- "time my_oswalk.py" == "real 1m6.225s" that is just >>for file in os.walk("."): pass<< -- now imagine that on a real busy mailserver that doesn't have everything in the cache right now... – Martin M. Jun 10 '09 at 03:25
  • yeah, obviously the time needed is roughly linear with number of files+number of dirs. but there's no alternative short of embedding some accounting code into the filesystem itself. also, in my limited experience, most busy filesystems tend to have a good portion of the tree in cache already. – Javier Jun 10 '09 at 21:11