From time to time I run into problems when server hard disks (Linux) fill up quickly with lots of small files. When this happens I have to try to figure out how much space is being taken up and where the files are that are taking up the space. This can be a surprisingly frustrating task because:
- Just doing simple things like running ls in a directory with a lot of files can take a long time.
- df is fast, but inaccurate and imprecise
- du is accurate and can tell you where all your space is going, but takes forever to run
I want to know, quickly and accurately, where all my space is going on a hard disk where terabytes of space may be occupied by millions of small files.
It seems that this is impossible with conventional filesystems (if not, I'd like to hear about it)
My question is whether any of the new filesystems available on Linux (btrfs, zfs, reiserfs etc) have any super-clever features that might help with this problem. For example, I can imagine some kind of log - that is constantly updated every time there is a write - that contains a record of the amount of space occupied at each branch in the filesystem. Then asking my question would just be a matter of reading the log.
That's just a example of the kind of feature that might help, but I am asking for any examples of any sort of feature that might help with answering the question: tell me, quickly and accurately, exactly where the space is being used on my hard disk.
Thanks, Tom