I have a pretty big directory with many cache files which I want to reorganise for max performance (access times).
- 2x 2TB SATA III drives, software RAID 1 (mirroring)
- OS: Ubuntu 12.04 LTS
- filesystem: ext4
- 500 GB od data
- about 16-17 milions files
- average file size: 30KB
- filenames are MD5 hashes
Files are accessed (randomly) by PHP/Perl scripts. These scripts generates absolute paths and read the file. There is no directory listing: pretty much just fopen with absolute path to file.
Current directory hierarchy is: cacheDir/d4/1d/d41d8cd98f00b204e9800998ecf8427e.dat
So there are 256 of 1st level subdirectories (d4 in example), and 256 of 2nd level subdirectories (1d in example). On average, there is about 200-300 files in each 2nd level directory.
Problem: when there is a web traffic peak and a lot of fopen's in cacheDir, the iowait is growing, slowing down the system, cousing very high load and noticable delays. This high load appear only if files in cacheDir are accessed. If I access other dir/files with same frequency, disk and system are doing just fine.
I was wondering if changing cache directory structure would improve performance?
Changing to (for example): cacheDir/d/4/1/d/8/d41d8cd98f00b204e9800998ecf8427e.dat (16 subdirectories in: 1st, 2nd, 3rd, 4th level, and (on average) 15 files per 4th level subdir).
I know that Software RAID 1, on a simple, desktop SATA III drive is not a speed monster, but maybe there are some good methods for optimising filesystem ?
Please note:
- filesystem has enabled
dir-index - filesystem is mounted with
noatime - filesystem was optimised with
e2fsck -Df