3

I am using CentOS 5 with Plesk 9 (64-bit), I am running a site where users will be uploading pictures. With a 64 bit os, are there any limits to how many files I can store? All I care about is performance, and serving up the files. I'd prefer not to have 4 directories deep of scattered files. However, I am hoping, that at some point I could have 200-300 thousand images.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
Kladskull
  • 1,265
  • 5
  • 15
  • 32

9 Answers9

6

If you are using ext3, I found this quote (warning: spanish speaking site)

"There is a limit of 32k (32768) subdirectories in a single directory, a limitation likely of only academic interest, as many people don't even have that many files (though huge mail servers may need to keep that in mind). The ext2 inode specification allows for over 100 trillion files to reside in a single directory"

Further reading showed that ext3 doesn't have a 32K limitation, which can be empirically proven with

a=0; i=1; while [ $a == 0 ]; do touch $i; a=$?; let i++; done

but it does have a 32K folder limit for folder, which can be tested with

a=0; i=1; while [ $a == 0 ]; do mkdir $i; a=$?; let i++; done

This (unfounded) claim says that

ReiserFS has no trouble at all with hundreds of thousands of files in a single directory. flabdablet - February 1, 2007

This question from sister site stackoverflow.com could help too.

In general:

  • There is a limit to the amount of directories,
  • You should keep your files/directories under 32K, but can go a lot further,
  • The file system you are using does matter.
voyager
  • 698
  • 1
  • 6
  • 13
  • 3
    note that even if ReiserFS' structures cope nicely with humongous directories, any program that scans the directory would take ages to finish. (i.e. shell pattern matching. it really hurts) – Javier Jun 03 '09 at 20:46
  • During the search I found some references that accessing a a single, known file had a great (normal) speed, but when ls-ing the directory, it crawled to a halt. Anyway, I guess I wouldn't like to have to manage that amount of files in a single directory... – voyager Jun 03 '09 at 21:03
1

This depends greatly on the filesystem you use. Certain older versions of ext3 were attrocious with this, which is how the btrees came about. Reiser is a lot more performant with large numbers of files such as that. In older days I've had a Novell NSS directory on a NetWare server with 250,000, 4kb files in it due to a GroupWise flub and it worked just fine. Enumerating the directory sucked a lot, but accessing a specific file in that directory worked as fast as you'd hope. As this was 8 years ago, I must presume modern Linux filesystems can handle that with aplomb.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
1

It depends on the filesystem you're using, not the 64-bit'ness of the operating system. With every filesystem, there's going to be some point at which the big-O costs of the algorithm used to search the directory are going to get the better of the computer.

If you can break the file hierarchy up into even just a two (2) tier hierarchy you'll see better long-term scalability.

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
1

File systems in Linux store directory in basically two ways:

  1. As a flat list of files.

  2. As a data structure (usually a B+Tree or related data structure).

The former gets progressively slower as files are added. The latter does not. Note that ls might still take forever since it has to lookup the inodes of all those files, the directory entries only contains the filename and inode number.

Ext3 directories are flat lists, with an option for a hashed tree index to speed things up.

XFS uses B+Trees.

But for either of these file systems, if you do an ls -l, it'll need to hit as many inodes as there are files. For name lookups (when opening a file for example) B+Tree and things like that will be much faster for large directories.

A hierarchy of directories makes it easier to manage the files however and so you might want to consider that possibility. Even a single layer of directories with, say, 4000 files limit each, would make things much easier to manage.

1

If you're going beyond a few hundred images, definitely consider two things:

  1. Nested hierarchies with hashed filenames;
  2. Not using ext3

I'd recommend using XFS, or, failing that, ReiserFS, with a two- or three-deep directory hierarchy divided up by two-byte pairs. e.g.

11/2f/112f667c786eac323e300632b5b2a78d.jpg
49/2f/49ef6eb6169cc57d95218c842d3dee5c.jpg
0a/26/0a26f9f363f1d05b94ceb14ff5f27284.jpg

This will give you 256 directories in the first few levels, splitting images up over a total of 65535 separate directories (which is more than enough for 100-200k images and beyond). It will make things much faster and much more scalable, and make it a lot easier to maintain later on as well.

Dan Udey
  • 1,460
  • 12
  • 17
0

Most default configurations of ext3 have a limit 32K subdirectories per directory (can't rememeber the actual number now but we ran into just that issue a couple of weeks ago System was Debian/Etch at that time).

Might also hit you in some applications that use a lot of caching.

Martin M.
  • 6,428
  • 2
  • 24
  • 42
  • If had a hard time determining if that subdirectory limit applies to files or only subdirectories. Since it is derived from the inode limit, I'm thinking it applies to files as well. – Kevin Kuphal Jun 03 '09 at 20:32
0

Consider not using ext3, certainly. http://kernelnewbies.org/Ext4#head-97cbed179e6bcc48e47e645e06b95205ea832a68 (shows new features in ext4) might be a helpful kicking off point.

Would say have a look at how squid organises its cache too (multiple layers of directories) as many files in one directory may prove tough to maintain. Long lists (generally) suck.

Tom Newton
  • 4,021
  • 2
  • 23
  • 28
0

ext3 filesystems have htrees for big directories by default on most distros. do a tune2fs -l /dev/sda1 (or whatever blockdevice you're using) and check the "Filesystem features:" line. if there's a "dir_index" among them, you're golden.

note, however, that even the best directory structures can only make it fast to find one specific file. doing ls on a huge directory is going to be terrible, as would be any pattern matching, even if you know it matches a single file.

for these reasons, it's usually better to add one or two levels of directories. usually using some bits of an ID to name the directories.

Javier
  • 9,078
  • 2
  • 23
  • 24
0

Its going to depend somewhat on what filesystem you're using on your Linux server.

Assuming you're using ext3 with dir_index, you should be able to search large directories quite fast so speed shouldn't be much of a problem. Listings (obviously) will take longer.

As for the max number of files you can put in the directory, I'm pretty sure you can work reliably up to 32,000 files. I'm not sure I'd want to exceed that (even though you probably can).

KPWINC
  • 11,274
  • 3
  • 36
  • 44