Although I've browsed some of the questions here, I think every situation is different and maybe requires a totally different solution.
What I have now:
- Linux software RAID5 on 4x4TB enterprise HDD
- LVM on top with a few volumes
- The most important, storage volume, a 10TB XFS
- All setup with default parameters in Debian Wheezy
- The volume is mounted with options 'noatime,nodiratime,allocsize=2m'
- About 8GB RAM free and used for caching I guess, Quad core Intel CPU with HT not very used
This volume mostly stores about 10 million files (at most 20M in the future) between 100K and 2M. This is a more precise distribution of file size ranges (in K) and numbers in range:
4 6162
8 32
32 55
64 11577
128 7700
256 7610
512 555
1024 5876
2048 1841
4096 12251
8192 4981
16384 8255
32768 20068
65536 35464
131072 591115
262144 3411530
524288 4818746
1048576 413779
2097152 20333
4194304 72
8388608 43
16777216 21
The files are mostly stored at level 7 on the volume, something like:
/volume/data/customer/year/month/day/variant/file
There are usually ~1-2K files inside those folders, sometimes less, other times up to 5-10K (rare cases).
I/O isn't so heavy but I experience hangs when pushing it a little bit more. For example:
- Application that performs most I/O is NGINX for both reading and writing
- There are some random reads of 1-2MB/s TOTAL
- I have some folders where data is continuously written at a rate of 1-2MB/s TOTAL and all files older than 1h should be periodically removed from the folders
Running the following cron once per hour hangs for a few good seconds the entire server many times and may even disrupt the service (the writing of new data) as I/O timeouts are generated:
find /volume/data/customer/ -type f -iname "*.ext" -mmin +60 -delete
find /volume/data/customer -type d -empty -delete
I also observe slow writing speeds (few MB/s) when writing files in the above ranges. When writing larger files, it goes OK until write cache fills (obviously) and then speed drops and starts hanging the server in waves.
Now, I am searching for a solution to optimize my storage performance as I am sure that I am not optimal at defaults and many things may be improved. Although not that useful for me, i wouldn't drop LVM if it it doesn't provide significant gain also because although possible, I wouldn't reinstall the whole server by dropping LVM.
Read a lot about XFS vs. ReiserFS vs. Ext4 but I am quite puzzled. Other of my servers in a much smaller RAID1 2TB volume but exactly same setup and significantly heavier workload perform quite flawlessly.
Any ideas?
How should I debug/experiment?
Thanks.