I have a pretty big directory with many cache files which I want to reorganise for max performance (access times).
- 2x 2TB SATA III drives, software RAID 1 (mirroring)
- OS: Ubuntu 12.04 LTS
- filesystem: ext4
- 500 GB od data
- about 16-17 milions files
- average file size: 30KB
- filenames are MD5 hashes
Files are accessed (randomly) by PHP/Perl scripts. These scripts generates absolute paths and read the file. There is no directory listing: pretty much just fopen
with absolute path to file.
Current directory hierarchy is: cacheDir/d4/1d/d41d8cd98f00b204e9800998ecf8427e.dat
So there are 256 of 1st level subdirectories (d4
in example), and 256 of 2nd level subdirectories (1d
in example). On average, there is about 200-300 files in each 2nd level directory.
Problem: when there is a web traffic peak and a lot of fopen
's in cacheDir
, the iowait
is growing, slowing down the system, cousing very high load and noticable delays. This high load appear only if files in cacheDir
are accessed. If I access other dir/files with same frequency, disk and system are doing just fine.
I was wondering if changing cache directory structure would improve performance?
Changing to (for example): cacheDir/d/4/1/d/8/d41d8cd98f00b204e9800998ecf8427e.dat
(16 subdirectories in: 1st, 2nd, 3rd, 4th level, and (on average) 15 files per 4th level subdir).
I know that Software RAID 1, on a simple, desktop SATA III drive is not a speed monster, but maybe there are some good methods for optimising filesystem ?
Please note:
- filesystem has enabled
dir-index
- filesystem is mounted with
noatime
- filesystem was optimised with
e2fsck -Df