Testing for disk write

Question

I'm writing an application for storing lots of images (size <5MB) on an ext3 filesystem, this is what I have for now. After some searching here on serverfault I have decided for a structure of directories like this:

000/000/000000001.jpg
...
236/519/236519107.jpg

This structure will allow me to save up to 1'000'000'000 images as I'll store a max of 1'000 images in each leaf.

I've created it, from a theoretical point of view seems ok to me (though I've no experience on this), but I want to find out what will happen when there will be directories full of files in there.

A question about creating this structure: is it better to create it all in one go (takes approx 50 minutes on my pc) or should I create directories as they are needed? From a developer point of view I think the first option is better (no extra waiting time for the user), but from a sysadmin point of view, is this ok?

I've thought I could do as if the filesystem is already under the running application, I'll make a script that will save images as fast as it can, monitoring things as follows:

how much time does it take for an image to be saved when there is no or little space used?
how does this change when the space starts to be used up?
how much time does it take for an image to be read from a random leaf? Does this change a lot when there are lots of files?

Does launching this command

sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

has any sense at all? Is this the only thing I have to do to have a clean start if I want to start over again with my tests?

Do you have any suggestions or corrections?

EDIT: I've made the filesystem choice, opposed to the db, because of this two questions:

score 2 · Answer 1 · answered Jun 14 '10 at 17:27

2

First of all, be careful with the file system limitations. You will never store more than 2^32 files in a vanilla EXT3 file system, as there is a limit on the maximum number of inodes (check df -i). In addition to this, there are maximum FS size limits and such to consider.

Secondly: Do you really need to have the files in the filesystem? Depending on how the files are accessed you might find that you get better (and much more predictable) performance by putting the files into a database. In addition to this, databases are much easier to handle, backup, move etc. Any application design that involves millions of files is flawed and will come back to bite you hard in the future.

answered Jun 14 '10 at 17:27

pehrs

8,749
29
46

I've read about more advice on storing in the filesystem rather than in the database, that's why the filesystem choice. (http://serverfault.com/questions/95444/storing-a-million-images-in-the-filesystem and http://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay) – Alberto Zaccagni Jun 14 '10 at 22:22
It depends on your usage patterns. I have a few TB of measurement data that I frequently access, and it's all in a Postgresql database. I wouldn't ever want to go back to the time of 1'000'000 files in the filesystem. Postgresql has special datastructures for binary blobs (http://jdbc.postgresql.org/documentation/80/binary-data.html). Note that answers on Stackoverflow are likely to be more "it's easy to code" than "it's a good idea for the administrator", which is what Serverfault is for... – pehrs Jun 15 '10 at 08:57
Hehe, yes, as a developer here I wrongly tend to be on the "it's easy to code" side. I'll have a look at the link you posted. Thank you :) – Alberto Zaccagni Jun 15 '10 at 09:19

score 1 · Accepted Answer · answered Jun 14 '10 at 17:40

1

Pehrs raises a very good point about file-systems with that many files. When it comes time to back up that filesystem it will take a VERY long time. File-traversal is one of the biggest time-sucks during a backup process, right along all those file-open/file-close requests. The question, "how much time does it take for an image to be saved when there is no or little space used?" suggests these files will be pretty small, so a filesystem of this type is almost text-book for worst-case backup scenarios (one case is worse: all those files in a single directory).

Contrast that with a true database, where dumping the DB to backup is a very fast, efficient operation. Yes, that database may be VERY large, but it'll backup a LOT faster, and may even serve data faster as the file-count grows. It can depend on what DB you use and how well it is managed, but generally using a DB store instead of an FS store in this case will provide better disaster resilience.

If a DB is not an option, then yes, pre-creating the directory structure is your best bet. What'll also help is load-balancing the file-creates across the entire structure and not just go until /000/000/ is filled before moving on to /000/001/. This should ensure that file-counts per directory remain low for quite some time.

answered Jun 14 '10 at 17:40

sysadmin1138

131,083
18
173
296

I excluded db as an option because of these: http://serverfault.com/questions/95444/storing-a-million-images-in-the-filesystem and http://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay, but I'm still thinking, so if you can convince me, I'm listening :) – Alberto Zaccagni Jun 14 '10 at 22:25
It really depends on your usage patterns. Using a balanced-tree file structure with on-demand expansion once you grow to certain sizes (like what pjz said) is a good way to keep performance good. You might want to look into something other than ext3 for this. For instance, XFS has a very good backup utility (xfsdump) that should handle this backup case a lot easier than, say, tar. I just don't know if it is markedly better/worse than ext3 at the 100M's of itty bitty files case. – sysadmin1138 Jun 14 '10 at 22:53
It will be an app to store images, so images will be frequently accessed and added. I'll have a look at other filesystems, thanks. – Alberto Zaccagni Jun 14 '10 at 23:01
If it is within your power to do so, a utility that can detect orphan content in that image archive will be greatly received by the system administrator. We have a 10's of millions system here (NTFS based) that doesn't have that, and it is a continual vexation. We had to write our own, and it breaks every app-upgrade. This util can run for a LONG time, but it is very valuable for maintaining data-quality. – sysadmin1138 Jun 14 '10 at 23:26
This will be a project in which I'm the only tech, my friend has skills in web design and graphics... so for some time I'll have to try and be a decent sysadmin. With "orphan content" you mean images that do not have a tuple in the db right? – Alberto Zaccagni Jun 15 '10 at 07:55
That's exactly it. A variety of abnormal functions can cause such orphans to crop up (example, I once had 900K identical 4K files created because the DB was locked). If it gets bad, a clean-up procedure is very useful. If this will have up to a billion files in it, liberating available file-handles (for 2^32-1 file limited filesystems) is useful by itself, as is liberating consumed disk-space. – sysadmin1138 Jun 15 '10 at 15:17
Thank you very much for all these valuable informations. – Alberto Zaccagni Jun 15 '10 at 15:42

pjz · Answer 3 · 2010-06-16T05:36:20.410

1

Do not create them all at startup.

Create the top level 1k dirs if you like, but beyond that do them on-demand. Otherwise, creating them all will eat a bunch of your filesystem's inodes that will most likely never be used.

Consider: 1 inode is consumed per directory created (inodes hold permissions and ownership info, for both files and directories). So the top level 1000 directories is... 1000 inodes. The next level down is 1000*1000 or 1000000 inodes. A million, which even on today's big disks is a not inconsiderable amount. If you fill a 1TB drive with 5MB files, that's... 200k files. You're going to spend more inodes on the directory structure than on the files themselves. Heck, you're going to have more directories than files!

edited Jun 16 '10 at 05:36

answered Jun 14 '10 at 19:43

pjz

10,497
1
31
40

Could you post a link about what you said please? – Alberto Zaccagni Jun 14 '10 at 22:36
Ok got the point, thanks. I'll go and read more about inodes, I know near nothing about those. – Alberto Zaccagni Jun 16 '10 at 08:05

Testing for disk write

3 Answers3