6

We have a Linux server process that writes a few thousand files to a directory, deletes the files, and then writes a few thousand more files to the same directory without deleting the directory. What I'm starting to see is that the process doing the writing is getting slower and slower.

My question is this: The directory size of the folder has grown from 4096 to over 200000 as see by this output of ls -l.

root@ad57rs0b# ls -l 15000PN5AIA3I6_B total 232 drwxr-xr-x 2 chef chef 233472 May 30 21:35 barcodes

On ext3, can these large directory sizes slow down performance?

Thanks.

Aaron

Aaron
  • 130
  • 2
  • 7

1 Answers1

9

Yes, large directory sizes can be a problem. It's generally best to avoid them by hashing files into subdirectories. If that's not an option, there is an ext3 feature that can dramatically improve the lookup performance in large directories:

tune2fs -O dir_index /dev/$DEVICE
e2fsck -D /dev/$DEVICE

This enables b-tree hashes of directory index data, dramatically improving lookup time. Of course, it's possible your install already has this enabled. You can check by running this command and looking for dir_index in the output:

tune2fs -l /dev/$DEVICE | grep 'features:'

EDIT: Also, you may want to consider setting noatime as a mount option. It's not a specific tweak for large directories, but can offer considerable performance improvements whenever lots of filesystem activity is taking place.

Insyte
  • 9,314
  • 2
  • 27
  • 45
  • Thanks for the great answer. dir_index is turned on by default in CentOS 5.5 it seems. So back to the drawing board. Is there a way to shrink the index other than recreating the dir and copying the files into it? – Aaron Jun 03 '10 at 18:17
  • Running the `e2fsck -D` command I mentioned above should rebuild the directory indexes. – Insyte Jun 03 '10 at 19:19