3

I am developing a site that lets people upload images. We resize each image to 4 sizes. We're expecting lots and lots of images and are considering ways to increase performance with regards to the file structure, as we don't really want one directory with 10000s of files. Anyone got any suggestions as to how we organise the files?

The options that seem obvious are

each user has own folder, and within that a folder for each size

(Each of the four folders could hold a lot of images)

/user_uploads/user01/
                 |-/size_thumb/
                 |-/size_small/
                 |-/size_medium/
                 |-/size_large/
/user_uploads/user02/
                 |-/size_thumb/
                 |-/size_small/
                 |-/size_medium/
                 |-/size_large/

   etc etc

or each users photos stored in one folder per user (more photos per directory, but less overall directories)

/user_uploads/user01/
/user_uploads/user02/
  etc etc

each photo stored by size

lots and lots of photos per directory (could have further subfolders by date?)

/user_uploads/small/
/user_uploads/medium/
/user_uploads/large/
/user_uploads/thumbs/

Anyone got any ideas? I think we'll probably go with /user_uploads/userID/ unless anyone has any suggestions.

(Right now everything will be hosted on one computer, so we don't have to worry about files being on different servers)

st0rage
  • 31
  • 1
  • 2
  • Duplicate of [Storing a million images in the filesystem](http://serverfault.com/questions/95444/storing-a-million-images-in-the-filesystem) – Mark Henderson Jul 20 '10 at 23:03
  • While I was thinking this was a duplicate as well, I'm not sure I'd agree that NTFS to UNIX filesystems would be apples to apples. Perhaps you could provide us more information about your platform, st0rage? – Warner Jul 20 '10 at 23:28
  • you say best file structure, but your examples are directory structures. The best file structure IMHO is to use a BLOB in a database – Nick Kavadias Jul 21 '10 at 03:28
  • 1
    The question does not have enough details about the kind of operations which will be performed on these images to allow picking one solution instead of another. You can also try `/user_uploads/small/user01/` to be able to spread the load between physical disks for the various sizes of image. Ultimately you probably don't want to expose `/user_uploads/user01`, you should show `/img/XXXXXXXXXXXXX/` and use a database of some sort to map this global id to the internal path of the picture, which would allow you to restructure the pictures without changing the external paths... – pascal Jul 21 '10 at 12:24
  • Funny, this "n"+590x"o" which messes up the comment display... – pascal Jul 21 '10 at 12:26
  • quick! someone call the sysadmin. The website is broken – Nick Kavadias Jul 22 '10 at 00:15
  • @nick - noooooooooooooooooooooooooooooooo (my old comment broke the display. My apologies) - see [here](http://serverfault.com/questions/95444/storing-a-million-images-in-the-filesystem) for the reason. – Mark Henderson Jul 22 '10 at 22:46
  • @Farseeker, yeh, i know. Nice going! i was being funny – Nick Kavadias Jul 23 '10 at 06:00

1 Answers1

4

You might want to try md5 hashing of the image as it is uploaded, and then storing them in a directory structure like that below. Assuming 3 images which hash to:

  1. 2b00042f7481c7b056c4b410d28f33cf
  2. 84bdbf7c4d48e16642af4c317df428c2
  3. 7b2a7edc6e86224d6ba0f97b717c80ed

And a folder structure that looks like this:

/images/orig/2/2b/2b0/2b00042f7481c7b056c4b410d28f33cf.jpg
/images/orig/8/84/84b/84bdbf7c4d48e16642af4c317df428c2.jpg
/images/orig/7/7b/7b2/7b2a7edc6e86224d6ba0f97b717c80ed.jpg

/images/large/2/2b/2b0/2b00042f7481c7b056c4b410d28f33cf.jpg
/images/large/8/84/84b/84bdbf7c4d48e16642af4c317df428c2.jpg
/images/large/7/7b/7b2/7b2a7edc6e86224d6ba0f97b717c80ed.jpg

/images/small/2/2b/2b0/2b00042f7481c7b056c4b410d28f33cf.jpg
/images/small/8/84/84b/84bdbf7c4d48e16642af4c317df428c2.jpg
/images/small/7/7b/7b2/7b2a7edc6e86224d6ba0f97b717c80ed.jpg

You can make as many levels following the above pattern as you want to keep directory sizes manageable. Also if you prefer, you can use some user id to identify the images and still use a similar structure e.g. assuming user id of 14: (/images/orig/0/00/0014/0014.jpg)

You can store user -> image hash data in your database, while keeping your images on the filesystem. Regardless of the fact that it may be possible to store images inside a database, there are reasons you may not want to do so. Keeping them on the filesystem makes them much easier to move, say to a CDN, or into the cloud as you grow. It also allows you to put directories on different disk to increase read performance, if that's your thing.

The fact that you hash the original image to md5 means that if 30 people upload the exact same image, you will only keep one copy (in all sizes) of that image on your filesystem instead of 30 copies.

gabe.
  • 268
  • 1
  • 6
  • 2b00042f7481c7b056c4b410d28f33cf is not the hash of an image file. It's the hash of "asdf\n" – vy32 Jan 12 '16 at 03:50
  • 1
    @vy32 the way md5 works, just because "asdf\n" hashes to 2b00042f7481c7b056c4b410d28f33cf -- doesn't mean there isn't *also* an image that hashes to the same value. But, you are correct -- for purposes of illustration, I didn't hash any actual images. – gabe. Jan 18 '16 at 19:36