10

I have a website which will store user profile images. Each image is stored in a directory (Linux) specific to the user. Currently I have a customer base of 30+, which means I will have 30+ folders. But my current Linux box (ext2/ext3) doesn't support creating more than 32000 directories. How do I get past this? Even YouTube guys have got the same problem, with video thumbnails. But they solved it by moving to ReiserFS. Can't we have a better solution?

Update:When asked in IRC, people were asking about upgrading it to ext4, which has 64k limit and of course you can even get past that too. Or kernel hacking to change the limit.

Update:How about splitting the user base into folders based on the userid range. Meaning 1-1000 in one folder, 1000-2000 in the other like that. This seems to be simple. What do you say, guys?

Frankly, isn't there any other way?

Royce Williams
  • 1,362
  • 8
  • 16
None-da
  • 201
  • 1
  • 2
  • 6
  • 1
    Why don't you want to change the filesystem? If this is an limitation of ext2/3 you won't have any other change than changing the filesystem or splitting the current FS into more smaller FSs (more different mount points). – Manuel Faux Jul 27 '09 at 09:08
  • 1
    Manuel: If he changes the file system he is tying a specific FS to his application. Although that might end up being the answer, I would this is probably a problem that needs to be solved at the application level. If you need to hack the kernel or file system, you are probably going down the wrong path unless some very special requirements. – Kyle Brandt Jul 27 '09 at 13:48

10 Answers10

16

That limit is per-directory, not for the whole filesystem, so you could work around it by further sub-dividing things. For instance instead of having all the user subdirectories in the same directory split them per the first two characters of the name so you have something like:

top_level_dir
|---aa
|   |---aardvark1
|   |---aardvark2
|---da
|   |---dan
|   |---david
|---do
    |---don

Even better would be to create some form of hash of the names and use that for the division. This way you'll get a better spread amongst the directories instead of, with the initial letters example, "da" being very full and "zz" completely empty. For instance if you take the CRC or MD5 the name and use the first 8 bits you'll get somethnig like:

top_level_dir
|---00
|   |---some_username
|   |---some_username
|---01
|   |---some_username
...
|---FF
|   |---some_username

This can be extended to further depths as needed, for instance like so if using the username not a hash value:

top_level_dir
|---a
|   |---a
|       |---aardvark1
|       |---aardvark2
|---d
    |---a
    |   |---dan
    |   |---david
    |---o
        |---don

This method is used in many places like squid's cache, to copy Ludwig's example, and the local caches of web browsers.

One important thing to note is that with ext2/3 you will start to hit performance issues before you get close to the 32,000 limit anyway, as directories are searched linearly. Moving to another filesystem (ext4 or reiser for instance) will remove this inefficiency (reiser searches directories with a binary-split algorimth so long directories are handled much more efficiently, ext4 may do too) as well as the fixed limit per directory.

David Spillett
  • 22,534
  • 42
  • 66
  • Just updated the question description to include this:"Update:How about splitting the user base into folders based on the userid range.Meaning 1-1000 in one folder, 1000-2000 in the other like that. This seems to be simple. What do you say?" – None-da Jul 27 '09 at 09:33
  • 1
    That would work well, and would be more efficient than a hash, if the users are generally identified by user ID instead of (or as well as) username. Though if you always refer to them by name elsewhere in the system you'll have to add extra name->id lookups all over the place. – David Spillett Jul 27 '09 at 09:50
  • Thankyou David! I tried even different solution. I created hardly 4 folders with the range 1-30000, 30000-60000 etc.. I think getting a file from such a big directory will take more time than from a directory which has 1000 files(previous approach). What do you say? – None-da Jul 27 '09 at 16:53
  • 1
    That depends on the filesystem. If you are using ext2 or ext3 then I would recommend much smaller than 30,000 per directory. Some tools issue warnings about 10,000. You can turn directory indexing on in ext3/4 to help: tune2fs -O dir_index /dev/ but just keeping the number of objects in a directory lower (a couple of thousand or less?) is what I'd recommend here. – David Spillett Jul 27 '09 at 17:08
  • @Maddy, you want this solution due to other limitations on how Ext2/3 handles large numbers of files. See http://serverfault.com/questions/43133/filesystem-large-number-of-files-in-a-single-directory for some detail. Breaking out names into buckets-as-subdirectories alleviates other issues that you would have run into eventually. Note that this is the same strategy that Squid uses when it sets up the object cache for the first time - for instance, 64 directories each with 64 directories inside of them, just as an example. – Avery Payne Jul 27 '09 at 20:56
7

If you are bound to ext2/ext3 the only possibility I see is to partition your data. Find a criterion that splits your data into manageable chunks of similar size.

If it's only about the profile images I'd do:

  1. Use a hash (e.g SHA1) of the image
  2. Use the SHA1 as file and directory name

For example the SQUID cache does it this way:

f/4b/353ac7303854033

Top level directory is the first hex-digit, second level is the next two hex-digits, and the file name is the remaining hex-digits.

Ludwig Weinzierl
  • 1,170
  • 1
  • 11
  • 22
2

Cant we have a better solution?

You do have a better solution - use a different filesystem, there are plenty available, many of which are optimised for different tasks. As you pointed out ReiserFS is optimised for handling lots of files in a directory.

See here for a comparison of filesystems.

Just be glad you're not stuck with NTFS which is truly abysmal for lots of files in a directory. I'd recommend JFS as a replacement if you don't fancy using the relatively new (but apparently stable) ext4 FS.

gbjbaanb
  • 3,852
  • 1
  • 22
  • 27
  • Do you have good links to the NTFS filesystem performance? – Thorbjørn Ravn Andersen Jul 27 '09 at 16:56
  • yes, apart from personal experience with an app that was left too long creating new files in a directory.. (took hours to delete them all), and the subversion performance boost by limiting the number of files in a directory to 1000. Or read: http://support.microsoft.com/kb/130694 I don't think they ever "fixed" this as it still noted as a perf. tweak for NTFS. – gbjbaanb Jul 27 '09 at 18:50
1

I've hacked together a small web gallery, where I ended up with a variation of this problem; I "only" had ~30.000 images in the cache directory, which turned out to be quite slow (ext2 uses linked lists for directory indices, as I remember it).

I ended up doing something along these lines:

def key2path(key):
    hash = md5(key)
    return os.path.join(hash[0], hash[1], key)

This will partition the data in 256 directories, which gives a fast directory lookup for each of the three levels.

  • I've chosen to use MD5 over SHA-1, as MD5 guarantees a different output if you change any 12 bits of 32, so I find it a nice fit to hash user names, directories and other short stuff. And it's fast, too...
  • I do not include the entire hash, as it will produce way too many directories and effectively trash the disk-cache over and over.
Royce Williams
  • 1,362
  • 8
  • 16
Morten Siebuhr
  • 639
  • 1
  • 6
  • 16
  • 1
    You could probably use a simpler hash like CRC, as the hash does not need to be cryptographically strong like MD5 or SHA... but the performance difference is probably negligible anyway... – sleske Jul 27 '09 at 10:54
1

Is the profile image small? What about putting it in the database with the rest of the profile data? This might not be the best option for you, but worth considering...

Here is a ( older ) Microsoft whitepaper on the topic: To BLOB or not to BLOB.

Kyle Brandt
  • 82,107
  • 71
  • 302
  • 444
1

Generally you want to avoid having directories with a large number of files/directories in it. Primary reason is that wildcard expansion on the command line, will result in "Too many arguments" errors resulting in much pain when trying to work with these directories.

Go for a solution that makes a deeper but narrower tree, e.g. by creating subfolders like others have described.

0

We had a similar problem, the solution - as mentioned previously - is to create a hierarchy of directories.

Of course if you have a complex application which relies on a flat directory structure, you'll probably need a lot of patching. So it's good to know that there is a workaround, use symlinks which doesn't have the mentioned 32k limit. Then you have plenty of time to fix the app...

Karoly Horvath
  • 334
  • 1
  • 4
  • 14
0

Not an immediate answer to your problem, but something to watch for future reference is the OpenBSD linked project called 'Epitome'

Epitome is an engine which provides Single Instance Storage, Content Addressable Storage and Deduplication services.

All of your data is stored in a data store as hashed blocks, removing non-unique blocks to cut down on space usage, and allows you to essentially forget about the storage mechanism as you can simply request the content from the data store by UUID.

Epitome is currently experimental, but something to watch for the future.

Moo
  • 2,225
  • 19
  • 23
0

Why not use a timestamp approach, and then have an overflow option.

For Example

So lets say your timestamp is: 1366587600

Omit the last 2 digits (or else it just gets slightly ridiculous). Separate the stamp into sets of 4 (the directory count shouldn't reach more than 9999 - if you want to you could separate it differently).

This should leave you with something like this:

/files/1366/5876/

Then also check the amount within the dir before uploading, if it's getting a large number of uploads (i.e. 32000 + per 100 secs), then iterate the directory by the second or a letter, for example:

/files/1366/5876/a/file.txt

or

/files/1366/5876/00/file.txt

Then log the timestamp + letter or the full path code into a db along with the user and you should be set.

pathstamp: 1366587600 or 13665876a (if your using letters).

This does end up with a large number of directories, but it can be really useful for handling file revisions. For example, if a user wants to use a new profile picture, you still have the old timestamped version of the older one in-case they wish to undo the changes (its not just over-written).

slm
  • 7,355
  • 16
  • 54
  • 72
0

I'd suggest deciding how many maximum subdirectories you want to (or can) have in the parent folder.

Then you need to convert your user id so they start from 1.

Then you can do: modulo = currentId % numberOfSubdirectories

modulo will now contain your subdirectory number that will be never greater than numberOfSubdirectories you have chosen.

Do whatever you want with modulo, hash it, for example.

Also this way subdirectories will be filled linearly.

vitro
  • 101
  • 2