5

From time to time I run into problems when server hard disks (Linux) fill up quickly with lots of small files. When this happens I have to try to figure out how much space is being taken up and where the files are that are taking up the space. This can be a surprisingly frustrating task because:

  1. Just doing simple things like running ls in a directory with a lot of files can take a long time.
  2. df is fast, but inaccurate and imprecise
  3. du is accurate and can tell you where all your space is going, but takes forever to run

I want to know, quickly and accurately, where all my space is going on a hard disk where terabytes of space may be occupied by millions of small files.

It seems that this is impossible with conventional filesystems (if not, I'd like to hear about it)

My question is whether any of the new filesystems available on Linux (btrfs, zfs, reiserfs etc) have any super-clever features that might help with this problem. For example, I can imagine some kind of log - that is constantly updated every time there is a write - that contains a record of the amount of space occupied at each branch in the filesystem. Then asking my question would just be a matter of reading the log.

That's just a example of the kind of feature that might help, but I am asking for any examples of any sort of feature that might help with answering the question: tell me, quickly and accurately, exactly where the space is being used on my hard disk.

Thanks, Tom

ewwhite
  • 194,921
  • 91
  • 434
  • 799
Tom Scrace
  • 163
  • 4

2 Answers2

2

I only have experience with ZFS in the list you mentioned. With ZFS you can make hierarchical volumes, so for example you could make;

  • tank/category
  • tank/category/product
  • tank/category/product/a
  • tank/category/product/b

etc

With the command "zfs list" you can then get the used, available and reference space for each volume within seconds. But this ofcourse only works when you are able to let your application split it up the right way.

Jeroen
  • 1,339
  • 7
  • 16
  • Interesting. Thank you. This *would* make like somewhat easier and is a partial solution. I still wonder if a full general solution exists and - if it does not - what the reasons are. This seems like an obvious feature, and if it doesn't exist I am sure there must be a good reason why it is technically either impossible or really hard or involves unacceptable trade-offs. – Tom Scrace Sep 02 '13 at 15:37
  • I suppose that in theory it would be possible to write an application such that whenever it would have created a new directory it instead created a new filesystem in the same place in the hierarchy. I don't know whether such an approach would be practical or sane though. – Tom Scrace Sep 02 '13 at 16:08
2

I still use ncdu with my ZFS filesystems. It's even more important now, as it is sparse-file aware and helps make sense of compressed ZFS filesystems.

See: How can I determine what is taking up so much space?

ewwhite
  • 194,921
  • 91
  • 434
  • 799