Cost of storing many files on a fileserver

Question

I need to store a lot of data contained in SQLite databases on a file server. I have the opportunity to split the data up into many files. This means that there is less risk of a large part of the data getting corrupted, is easier to move. Less problems with locking etc. My question is, how many files are too many files. 100.000? 1.000.000? 10.000.000 files? In other words, what is the overhead of creating a file on a file server? When I talk about overhead I am talking about number of rotations to create a file. I know about blocks and block sizes and I am not concerned about the storage wasted by storing in many files.

My question is not regarding whether it is best to store such a database on a fileserver at all and not utilize a proper database server utilizing other database software.

The environment is a microsoft environment, but I do not know anything specific about the file server.

Sorry, what do you mean by "number of rotations to create a file"? — MadHatter, Dec 14 '10 at 10:44
Good question. I mean, the number of disk rotations needed. This is usually the limiting factor on disks. — David, Dec 14 '10 at 10:46
I started to write a reply about directory entries and inodes, but then I saw it was a Windows environment, about which I can say nothing useful; any sane answer will be intensely OS- and filesystem- specific. So I'm sorry not to be able to help, and I'm retagging your post Windows to more accurately point it at people who may have light to shed. — MadHatter, Dec 14 '10 at 11:13

score 1 · Answer 1 · answered Dec 14 '10 at 11:53

1

More than 10,000 in a folder will give you trouble accessing it with explorer. This can be avoided by breaking it down into a tree of folders.

Also if your files are not a multiple of clustersize (usually 4KB) then they will waste the remainder per file. Depending on the filesize this can be significant or not.

Also access of many little files is slow due to overheads. This could limit the speed of things like backups. If you can design your usage to read larger files sequentially and do random access in memory you will be better off.

answered Dec 14 '10 at 11:53

JamesRyan

8,138
2
24
36

You have some good points, but I am specifically after the overheads in terms of speed that is created by creating many files. – David Dec 14 '10 at 12:26
It very much depends what you are doing with them. You need to run your own test of your specific usage. In most cases I would suspect that it will be significant. But in terms of the filesystem you won't have noticable overhead as long as you split them into sensible numbers per folder. – JamesRyan Dec 14 '10 at 12:39

score 1 · Accepted Answer · answered Dec 14 '10 at 13:26

SQLite is a very cool product - but if you are accessing a database over a network, its a VERY bad idea to do so using a file-based access - even if the DBs are read-only and you don't have any concurrency to worry about, performance will be awful. You must have a very good reason for doing it this way.

In practice, assuming that performance, concurrency and locking are not issues, I would not expect any significant difference between creating 1000 files or writing the same data to 10 files as a batch, however this will vary massively depending on the nature of the underlying filesystem. OTOH, with lots of transactions occurring randomly across the files, I'd expect the smaller number of files to be more efficient. For reads, I'd expect a similar pattern. But there's only one way to find out for sure - try it.

Cost of storing many files on a fileserver

2 Answers2