linux: accessing thousands of files in hash of directories

I would like to know what is the most efficient way of concurrently accessing thousands of files of a similar size in a modern Linux cluster of computers.

I am carrying an indexing operation in each of these files, so the 4 index files, about 5-10x smaller than the data file, are produced next to the file to index.

Right now I am using a hierarchy of directories from ./00/00/00 to ./99/99/99 and I place one file at the end of each directory,
like ./00/00/00/file000000.ext to ./99/99/99/file999999.ext.

It seems to work better than having thousands of files in the same directory but I would like to know if there is a better way of laying out the files to improve access.

719016

Posted 2012-07-03T08:20:49.857

Reputation: 2 899

Answers

A common performance problem with large directories on ext[34] is that it hashes the directory entries, and stores them in hash order. This allows resolving a specific name quickly, but effectively randomizes the order the names are listed in. If you are trying to operate on all of the files in the directory and just iterate over each entry in the order they are listed in, you cause a lot of random IO, which is very slow. The workaround to this is to sort the directory listing by inode number, then loop over the files in order from lowest to highest inode number. This keeps your IO mostly sequential.

psusi

Posted 2012-07-03T08:20:49.857

Reputation: 7 195

Is there any reference to this kind of things? – kawing-chiu – 2017-11-22T08:51:08.993

@kawing-chiu, if you search the e2fsprogs mailing list archive, you should be able to find a thread I was in discussing it. I even wrote a little python script to scan a directory and report the amount of correlation from directory entry -> inode number, and inode number -> starting block. I found my maildir that contained many thousands of files had very poor name -> inode number correlation. – psusi – 2017-11-22T22:07:34.207

@psusi Thanks for the information. From my test, iterating by inode has a 30% speed increase indeed. – kawing-chiu – 2017-11-23T02:19:23.443

@kawing-chiu, you might also be interested in http://launchpad.net/e2defrag. It's a very old ext2 offline defragmenter that can pack all files in perfect order, or even pack all files used during boot at the start of the drive to speed up boot times, when used in conjunction with ureadahead or e4rat.

– psusi – 2017-11-24T01:12:39.350

how do I sort the directory listing by inode number? – 719016 – 2012-07-04T07:49:15.213

@130490868091234, readdir() returns the name, and inode number for each entry. Sort the array of entries, in order from lowest to highest inode number. – psusi – 2012-07-05T01:25:07.047

A commonly used schema is renaming the files with their hash value while keeping the extension and using the first characters to store them in different folders.

i.e:
md5(test.jpg) gives you "13edbb5ae35af8cbbe3842d6a230d279"
Your file will be named "13edbb5ae35af8cbbe3842d6a230d279.jpg" and you store it in ./13/ed/bb/5ae35af8cbbe3842d6a230d279.jpg, that way and given a big amount of files you should have a good distribution of files per folder.

You end up with a similar tree as yours but lighter (metadata-wise) as you only have to store the original filename and its hash (the path being constructed from the hash).

As a side effect (which must be taken in account in development) you automatically gain file-based deduplication.
In addition to that if you generate the hash before storing the file you get free error checking too. You could imagine coding a little cronjob to check the integrity of your backups this way for example.

Shadok

Posted 2012-07-03T08:20:49.857

Reputation: 3 760

An accepted answer at ServerFault by Ignacio Vazquez-Abrams says

Provided you have a distro that supports the dir_index capability then you can easily have 200,000 files in a single directory. I'd keep it at about 25,000 though, just to be safe. Without dir_index, try to keep it at 5,000.

Which I'd take as suggesting

 ./000/file000000 to ./000/file000999
 ./001/file001000 to ./001/file001999
 ...
 ./999/file999000 to ./999/file999999

The size of a directory structure never shrinks, so if a directory has ever contained so many files that it has grown to an inefficient size, deleting or moving files out of that directory will not improve performance for that directory. So always start with new directories (if necessary rename over-large directories, create new directories, move files, delete old directories)

Answers to another Stackoverflow question say

Nowadays the default is ext3 with dir_index, which makes searching large directories very fast.

A commenter says

There is a limit of around 32K subdirectories in one directory in ext3, but the OP is talking about image files. There is no (practical?) limit on files in an ext3 file system with Dir Index enabled.

I think I'd run a few tests to see if organising files into subdirectories was worthwhile for anything other than ls performance. General rules of optimisation: 1 don't, 2 really don't, 3 measure.

RedGrittyBrick

Posted 2012-07-03T08:20:49.857

Reputation: 70 632