-4

Having numerous files will obviously slow down the OS; but how much this problem is serious? Consider that the files have been well distributed over multi-level folders. Does still the number of files (probably because of using inodes) can slow down the system?

I am talking about few millions! This is not too much for a Desktop computer, due to different programs; but this is too much for web servers.

I am curious to know whether storing few millions of files (in appropriate folders) has a significant effect on the server performance?

More Information: Consider ext4 as filesystem, and 100 files per folder in two-level folders.

Googlebot
  • 1,007
  • 2
  • 15
  • 29
  • 2
    Depends a lot on the filesystem in use, and how distributed the files are across folders. 100k files in a folder is definitely noticeable on some file systems, for example. More information ? – Kvisle Oct 21 '11 at 20:36
  • 2
    With your `more information`-block in mind - and the fact that a directory is also a file: No, this will not slow down your server. – Kvisle Oct 21 '11 at 21:34

2 Answers2

9

Having numerous files will obviously slow down the OS
No, it really won't. I've had *NIX systems at 99% inode utilization ("near the upper limit on the number of files the filesystem can hold") and had no performance problems.
My workstation is currently at 90% inode utilization, and all my performance problems are due to insufficient RAM.


but how much this problem is serious? Consider that the files have been well distributed over multi-level folders. Does still the number of files (probably because of using inodes) can slow down the system?
This is not a serious problem. Properly architected you should be able to hit your system's inode limit without any performance probems.
Also note that every directory ("folder") uses an inode on *NIX systems.


I am talking about few millions! This is not too much for a Desktop computer, due to different programs; but this is too much for web servers.
On what do you base this (mostly incorrect) statement? Assuming they're running the same OS, why would your desktop and a server be magically different in terms of filesystem behavior?

"different programs" has no effect on filesystem performance. The operating system is responsible for telling you what files are where (logically within the filesystem and physically on the disk), and most filesystems are very efficient at this.


I am curious to know whether storing few millions of files (in appropriate folder) has a significant effect on the server performance?
Millions of files in one directory? Not advisable (and not possible on many systems -- there is usually a limit on the maximum number of files within a directory).
Walking a very large directory tree may cause performance issues (it takes the OS time to walk the tree and list all the children, and then your software has to deal with the giant pile of data it's being handed), but if you do not have a grossly unreasonable directory structure (like "Everything in /dumping_ground") this should not be an issue.



In response to the edit above:

More Information: Consider ext4 as filesystem, and 100 files per folder in two-level folders.
You are joking, right? Consider the number of files in /usr/bin:

# ls -a /usr/bin | wc -l
     448

and that's small for /usr/bin.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 1. I meant it is normal to have millions of files in a Desktop computer, but the number of files in a web server is usually much lesser (probably because we pay more attention to a server to keep it tidy). 2. I emphasized that the files are well distributed over folders. I never meant all files in one single folder. – Googlebot Oct 21 '11 at 21:23
  • 4
    (1) You are incorrect. My primary analysis server has, at this time, over one billion files. This is the way that server operates. (2) A properly architected system should never dump so much junk in one directory as to cause a performance problem. (3) see edit above in response to your edit in the question :) – voretaq7 Oct 21 '11 at 21:30
  • To clarify further, the bulk of that 1B files consists of small (<500KB) files, and is contained on several filesystems that form a single hierarchy. The reason for splitting the filesystems was hitting the inode limit, not performance. – voretaq7 Oct 21 '11 at 21:41
  • When referring to 100 files per folder, I do not mean system files. I am talking about files uploaded to the server for web usages (e.g. image files). I am asking: does this slow down my server performance, if I upload millions of image files? – Googlebot Oct 21 '11 at 22:10
  • 2
    @Ali A file is a file. Independent of it's purpose. So executable, image or email doesn't make any difference. If you really like big storage then look at Cassandra. But it can currently handle only 2^128 (=340 undecillion) files. – mailq Oct 22 '11 at 19:45
2

The number of files won't have any significant effect on performance. The total size of the working set will affect performance if it exceeds the amount of RAM the server has, but that would occur the same way if it was one gigantic file or a million small files.

For some filesystems, having a very large number of files in a single directory or having very deep directory structures can affect performance. But this can be avoided either by choosing a filesystem that doesn't have this issue or by arranging the directory structure to avoid this.

David Schwartz
  • 31,215
  • 2
  • 53
  • 82
  • 1
    Large filesystems can expose weaknesses in other areas of the system or applications. For example, most file-based backup software is going to take a *long* time to do an initial backup on such systems, simply because processing all that metadata involves tons of small disk I/O -- the death knell for performance on normal HDs. Software that doesn't take any FS-level enhancements into consideration (like USN Change Journals for NTFS) to keep up with incremental changes will also perform badly. – afrazier Oct 21 '11 at 20:45
  • I don't think the question was really about slowing specific applications that had to specifically interact with the files but more about the mere existence of the files slowing down the OS or other operations. – David Schwartz Oct 21 '11 at 20:48