How to find biggest (in entries, not size) ext4 directory?

Question

Ubuntu 10.04.3 LTS x86_64, I am seeing the following in /var/log/messages:

EXT4-fs warning (device sda3): ext4_dx_add_entry: Directory index full!

Relevant info from dumpe2fs:

Filesystem features:      has_journal ext_attr resize_inode dir_index filetype
  needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg
  dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Free blocks:              165479247
Free inodes:              454382328
Block size:               2048
Inode size:               256

I've already read some other questions, such as ext3_dx_add_entry: Directory index full and rm on a directory with millions of files; those made me think that there must be a directory with a big number of items in it somewhere.

Since it is a rather complex directory organization I have a basic problem: how can I find the directory which is generating those messages?

Would something like this help? `find /path -maxdepth 1 -type d -printf '%p - %k\n'` That will show the disk space used up by the directory's inode, which should be proportional to the entries in that directory. It think anything else will require more involved scripting, e.g., perl with a find subroutine that does actual counts. — cjc, Aug 07 '12 at 16:26

score 4 · Answer 1 · answered Aug 07 '12 at 16:43

The following one-liner will give you a listing of how many files are in each directory, and sort by the top ten. It will run recursively from your current working directory, so I don't suggest you run this from / unless you have absolutely no clue where the large directories may be.

find . -type f | awk '{dir=gensub(/(.+\/).+/,"\\1","g (file://1%22,%22g/)"); dir_list[dir]++} END {for (d in dir_list) printf "%s %s\n",dir_list[d],d}d' | sort -nr |head

Output will be similar to the following:

[user@localhost ~]# find . -type f | awk '{dir=gensub(/(.+\/).+/,"\\1","g (file://1%22,%22g/)"); dir_list[dir]++} END {for (d in dir_list) printf "%s %s\n",dir_list[d],d}d' | sort -nr | head
2048 ./test19/
2048 ./test18/
2048 ./test17/
2048 ./test16/
2048 ./test15/
2048 ./test14/
2048 ./test13/
2048 ./test12/
2048 ./test11/
2048 ./test10/

If you're a bit wary about running such a one-line just search for all directories which themselves have a size of over 50k or so. Again find will be your friend here:

find ./ -type d -size +50k

If you have multiple mount points, a df -i will help you narrow down which mount is running out of, or has run out of, inodes.

score 0 · Answer 2 · answered Aug 27 '12 at 11:10

Using sh in the -exec portion of the command you can start an other shell and run your commands in there quite nicely.

find . -name "*.dat" -exec csh -c 'echo -n $1; grep ID $1 | wc -l' {} {} \;

Or in my case, when counting files in directories. I use "ls -f" as it produces the ls output unsorted which is significantly faster the trying to sort the out put before counting.

with new line beween dir name and count

find /somedir/some/dir -type d -print -exec sh -c ' ls -f $1/* | wc -l' {} {} \;

with tab between dir name and count

find /somedir/some/dir -type d -exec bash -c 'echo -en "$1\t"; ls -f $1/* | wc -l' {} {} \;

http://www.compuspec.net/reference/os/solaris/find/find_and_execute_with_pipe.shtml

How to find biggest (in entries, not size) ext4 directory?

2 Answers2