This is in relation to a question I posted on StackOverflow:


If you read the comments from Paul Alan Taylor's answer you will see what I'm talking about.

Here is my example folder structure:

The main folder contains 100,000 sub folders which each contain about 20 files. My question is, will I have performance issues when requesting a file (through the browser) on my web server if it's in one of those subdirectories?

  • 141
  • 1
  • 12

7 Answers7


You're running into a well-known issue. While there are filesystems that will accommodate millions of files (XFS and ReiserFS on Linux, and NTFS on Windows), they still have to sift through the stack of filenames searching for that one file. Just because it accommodates that many files doesn't mean that it will be quick. I have requested file properties on a Windows server with just tens of thousands of files on it, and that was pretty much a "go to lunch and come back" deal. I've also tried to get a directory listed via ls and found that the 20,000 some odd files in it required about 2 minutes of processing on a busy server (the filesystem is Ext3).

Fortunately, there is a solution, although it might be a bit different from what you're expecting.

Use additional subdirectories.

This is a well-known strategy and has been successfully used in a variety of programs. For instance, Squid uses layers of subdirectories to deal with the exact same issue for the same reason - hundreds of thousands of files that need to be quickly accessed. By using just one additional layer of directories, they can manage millions easily.

It's also alot more common in webpages that you would expect. Everytime you see a URL similar to this (bold added for emphasis):


...it's accomplishing the same effect. It's not about tracking articles by year and month, it's about improving the page load performance on the client by reducing the time the webserver spends looking for the page.

If at all possible, avoid 100,000 files per directory. Try to aim for 1,000 - 10,000 instead. If you are unsure how you'll accomplish this, just take the first letter of the file and make that an additional directory, i.e.




If that doesn't reduce your file count, you can take the 2nd letter, or 3rd, etc. until you have file counts down to a manageable size.


This process requires minimal coding on your part, is easily accomodated by the filenames alone, and will improve your access times regardless of the filesystem you use.

Avery Payne
  • 14,326
  • 1
  • 48
  • 87
  • Very good answer. I'll expand on my question if you don't mind helping me. Basically I'm going to be hosting a lot of subdomains on a lot of different domain names on this server, and I'd rather not have to use the database at all for these. I originally wanted it so that example.domain.com/lol.htm requested root/cache/example.domain.com/lol.htm in my server until I discovered this problem. – zuk1 Aug 04 '09 at 12:55
  • I'm pretty sure I could implement it so that example.domain.com/lol.htm requests /root/cache/domain.com/e/x/example/lol.htm but I'm not certain, that would be a good solution. Any comments in regards to that specific situation? – zuk1 Aug 04 '09 at 12:55
  • Are the pages you're serving static? If they are, it could be as simple as example.domain.com/q.php?lol.html, where q.php simply rejiggers a directory name out of the URL (by processing it as a string), then finds the file using the pathname in the resulting string and pipes it directly out. I believe there is also an address-rewrite feature in apache but I'm not a webmaster, so I can't elaborate. – Avery Payne Aug 04 '09 at 13:11
  • Ok thanks for your help. I should be able to figure out a good solution using the information in this thread! :) – zuk1 Aug 04 '09 at 13:13
  • 1
    The reason ls takes a long time is that it gets the full list of the files and sorts it. An open() would not take as long (on good filesystems). Also, if you *do* want to list them all, use "find . -maxdepth 1". It's much faster since it won't sort the files by name. – Thomas Aug 04 '09 at 14:22

From Novell's web site:

Another way to overcome the limitation of 32000 subdirectories for the EXT3 file system is to increase the directories i-nodes maximum count to 65500 for the EXT3 kernel module, then recompile and build the new kernel from existing kernel sources. REF

That being said, use a database.

Joseph Kern
  • 9,809
  • 3
  • 31
  • 55
  • It's for a caching system, to avoid the use of databases and increase performance on my site. – zuk1 Aug 04 '09 at 11:35

You need to use a file system that uses something like B+Tree examples of these are XFS JFS. Note no file system is good at storing files like that, you would be much better using a hashing scheme if you control the code that is writing in to the directory.

  • 2,212
  • 1
  • 13
  • 19

It depends on the filesystem. The normal linux filesystem ext3 will have problems with that much files. If you have that many files you should probably split them up somehow. A good way is to get the MD5SUM of the file, and take the first 2 characters as a directory name, then the next 2, etc depending on how many files you have.

  • 30,211
  • 62
  • 184
  • 246

You need to say which file system you are using. I have read ext3 has a maximum of 32,000 sub directories, so it won't even work.

Why do you have so many sub directories, perhaps you should be using a database? This might be likely if they are lots of small files.

I would think the right file system might be the secondary concern. You might want to pop back over to stackoverflow and look at what might the optimal tree structure (if a tree is even best) for what you are doing. Then try to find a file system or database that fits. Although it does make sense to think about these at the same time, you might want to figure out the computer science aspect of such large data sets first.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
  • 1
    It's for a caching system... There could be potentially 50,000,000 files to cache so I need to figure out the best way of doing this. I don't have any clue what filesystem I'll be using (server noob). I'm just going to buy a VPS from slicehost.com and pay someone to set up everything I need. – zuk1 Aug 04 '09 at 11:31
  • The other option I have thought about is having a main folder with up 1000 subfolders with up to about 10000 subfolders and then request from those... – zuk1 Aug 04 '09 at 11:34
  • 1
    Right, so you are trying to figure out how to structure you data. So maybe in stackoverflow, 'I have 50 million files of x size, what is the best data structure to access these the fastest?'. Then, once you have some ideas about that, look into the implementation. It possible optimizing the data structure and search into another language like C might make larger improvements then the implementation of the file system vs DB. A very interesting question :-) – Kyle Brandt Aug 04 '09 at 11:44

If it is a caching system, then lots of ram is the way to go. Std linux will cache file access into ram and side step nearly all the file system problems.

If you are going to be opening the folder for anything then you need put things into subfolders as any single folder with a few thousand files will take time to load. Directory reads are generally not cached by the system.

  • 3,027
  • 5
  • 24
  • 32

if you access the files by exact pathname, the performance loss will be less, but you should not forget about directories, which are special files. Every time you list a dir or search within, you are parsing the file. In this case, you need to distribute the load between different inodes. In your case, 120k dirs holding 20 files each, it's like 2.4 million files are being stored.

having the simple math, sqrt(120000*20)=1549, so if you distribute the files between ~1600 dirs and ~1600 files in each dir, you optimized the directory entries decrease by 98%+ (1600 entries instead of 120k entries), but with introducing further directories, this optimization can be better.

without having further info on your system, this is what can be told.

  • 2,020
  • 16
  • 28