I'm writing an application for storing lots of images (size <5MB) on an ext3 filesystem, this is what I have for now. After some searching here on serverfault I have decided for a structure of directories like this:
000/000/000000001.jpg
...
236/519/236519107.jpg
This structure will allow me to save up to 1'000'000'000 images as I'll store a max of 1'000 images in each leaf.
I've created it, from a theoretical point of view seems ok to me (though I've no experience on this), but I want to find out what will happen when there will be directories full of files in there.
A question about creating this structure: is it better to create it all in one go (takes approx 50 minutes on my pc) or should I create directories as they are needed? From a developer point of view I think the first option is better (no extra waiting time for the user), but from a sysadmin point of view, is this ok?
I've thought I could do as if the filesystem is already under the running application, I'll make a script that will save images as fast as it can, monitoring things as follows:
- how much time does it take for an image to be saved when there is no or little space used?
- how does this change when the space starts to be used up?
- how much time does it take for an image to be read from a random leaf? Does this change a lot when there are lots of files?
Does launching this command
sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
has any sense at all? Is this the only thing I have to do to have a clean start if I want to start over again with my tests?
Do you have any suggestions or corrections?
EDIT: I've made the filesystem choice, opposed to the db, because of this two questions: