Fragmented directory on ext4?

Question

My ext4 filesystem loses performance when growing.

I have a system storing a lot of image files. This Debian based image server stores image files divided in year folders on 1-2TB disk sets with hardware RAID-1.The files is stored in a structure of year folders and two levels of 256 folders below that.

Like

images/2021/2b/0f/193528211006081503835.tif

The are files are written continuously during the year and are evenly distributed by help of a hash so each leaf/image folder contains around 400 files at the end of the year.

This gives a total of around 256 x 256 x 400 = 26 214 400 files per year folder.

Iterating this folder structure works well up until approximately 20 million files. It takes maybe a few hours. When growing larger even listing a leaf folder with 300-400 files can take 1-4 seconds when not in cache. I suspect it has something to do with fragmentation in the directory entries.

Accessing an individual file when you know the path is always fast. And it is not a hardware/disk issue, the raw io performance is good. By the way, files are never deleted from this structure.

Defrag with e4defrag makes no difference. I suppose it only defrags files and not directories. fsck.ext4 -D might be a solution, but as this is a production system, I'm not keen on unmounting the filesystem and try.

What does help, is copying the files to a temporary folder and then moving them back overwriting the original. Like

cp -a images/2021/2b/0f/* images/2021/2b/tmp
mv -f images/2021/2b/tmp/* images/2021/2b/0f

After this operation performance is restored (even if not in cache). If the files themselves were fragmented I understand why this would help, but they aren't according to e4defrag. Moving the files to temp folder and back does not help.

Can some one help me understand what is happening here.

i think u need to add more details about the server, moreover i think your hournal will explode in size ;) also take a look at https://serverfault.com/questions/796665/what-are-the-performance-implications-for-millions-of-files-in-a-modern-file-sys mught also a duplicate of your question and also https://serverfault.com/questions/506465/is-there-a-hard-limit-to-the-number-of-files-a-directory-can-have — djdomi, Oct 19 '21 at 17:24
@djdomi Described the system better, thanks. Please explain what you mean by exploding journal, I'm not sure what you mean. Regarding those quoted articles, no folder will ever contain more than around 500 files, so that should not be an issue. I have thought about XFS (never used it) but have read both pros and cons, might be worth considering. — Stenborg, Oct 20 '21 at 07:04
what i mean is additionally, you will run into the maximum file count of ext4, if i count correctly, see here https://serverfault.com/questions/104986/what-is-the-maximum-number-of-files-a-file-system-can-contain — djdomi, Oct 20 '21 at 07:17
@djdomi, thanks for clarification. Each disk set is its own filesystem so a disk set gets full before maximum files or inode shortage is an issue. As long as one year of files can be stored on one disk set it's fine. — Stenborg, Oct 20 '21 at 07:30
im not a filesystem export, but i belife that its an issue during using ext4 for such extensive amounts of files — djdomi, Oct 20 '21 at 10:23

A. Genchev · Answer 1 · 2022-01-20T23:33:43.073

0

I'll try to answer for ext4. How do you create the filesystem ? what is output of e.g.:

sudo tune2fs -l /dev/sda1

Where I assume your ext4 volume is sda1. You should have "dir_index", "filetype" among the filesystem features. If not, you must format with these enabled. When these are OK, probably you want to trade file cache for metadata cache. if the output of:

cat /proc/sys/vm/vfs_cache_pressure

shows 100, try lowering this to 50. It can be made persistent in file etc/sysctl.conf where you can write:

vm.vfs_cache_pressure=50

and apply it by sudo sysctl -p This will increase the probability of caching meta data. Ext4 dir indices can fragment as you suggest. There is no direct cure, but you can take look at this Stackexchange post: how-to-atomically-defragment-ext4-directories

edited Jan 20 '22 at 23:33

answered Jan 20 '22 at 22:56

A. Genchev

1
2

Thanks for the suggestion. The cache pressure parameter looks interesting for keeping more meta data in cache. But where I have most problem is when scanning the whole tree. I doubt it gives a performance boost here. I will keep It in mind when the new disk-set becomes more populated at end of year. I also evaluating XFS instated of ext4 to see how it performs. – Stenborg Jan 24 '22 at 09:13
Usually an ext filesystem can hold 64k entries in one folder w/o perf. problem. I wonder also why you need to re-scan the whole tree while you're sure there are not deleted files. Have you considered changing your L2/L3 structure to Month/Day instead of hash ? The rationale is that then you won't have neither deleted nor new added files, so the old scan results will be valid. You will need to scan only for the next month. And the time is in one direction. – A. Genchev Feb 01 '22 at 13:58
The reason for using a hash is that the only things known when retrieving an image file is the year and the filename. Even if that could be solved in the future, it would result in storing more files per leaf folder. Like 100000 per day. – Stenborg Feb 02 '22 at 14:51
So, you're not adding the images from the current day/month but also old files from the beginning of the whole year ? Then you don't fetch these files every day/month. Otherwise, you'd know which file appeared at which time (from your viewpoint). – A. Genchev Feb 07 '22 at 16:04
Yes, you are right, we do write older images. Even images belonging to previous years. Then we randomly fetch images up to 5-10 years old. – Stenborg Feb 09 '22 at 10:25

Fragmented directory on ext4?

1 Answers1