4

We have hundreds of thousands of jpg images in a windows folder structure like this but it's really hard to interact and work with them in a snappy way (listing takes time, copying takes time, etc). Here's the structure:

images/
  1/
    10001/
      10001-a.jpg
      10001-b.jpg
      ...
      10001-j.jpg (10 images in each XXXXX folder)
    10002/
    10003/
    ...
    19999/
  2/
    20001/
    20002/
    20003/
    ...
    29999/
  3/
  4/
  5/
  6/
  7/
  8/
  9/

Now, browsing these images is a little bit slow because there are appr. 10 000 folders in each X folder and listing those simply takes time.

Is there a better way to organize the images with less subfolders/items? Would changing the structure to this have any effect?

images/
  1/
    0/
      0/
        0/
          0/
          1/
          2/
          3/
          4/
          5/
          6/
          7/
          8/
          9/
          10000/ (image folder, same as path)
            10000-a.jpg
            10000-b.jpg
            ...
            10000-j.jpg (10 images in each image folder)
        1/
        2/
        3/
        4/
        5/
        6/
        7/
        8/
        9/
      1/
      2/
      3/
      4/
      5/
      6/
      7/
      8/
      9/
    1/
    2/
    3/
    4/
    5/
    6/
    7/
    8/
    9/
  2/
  3/
  4/
  5/
  6/
  7/
  8/
  9/

Thus, locating image 48617-c.jpg would be equal to path 4/8/6/1/7/48617/48617-c.jpg.

The reason for having a separate folder with the full path number 48617 is to simplify copying of a complete 10-image batch (by copying the entire folder).

Now... no folder will have more than 11 immediate subfolders but there will be lots of extra single digit folders for separation purposes. Would this setup speed up browsing and interaction having multiple users adding/copying/deleting/etc images?

John Gardeniers
  • 27,262
  • 12
  • 53
  • 108
user1603240
  • 59
  • 1
  • 4
  • 1
    There is no "best" way. – John Gardeniers Aug 24 '12 at 00:23
  • First, make sure you turn off automatic thumbnail caching, and second, make sure you don't access it in thumbnail view and third, that is a mighty impressive pr0n collection. It might be easier to organize if you tag the files or name them with some indication of their content, though. :o – HopelessN00b Aug 24 '12 at 01:29

3 Answers3

5

Windows is a bit special when it comes to folder layout with kajillions of files. Especially images, since Windows Explorer treats them special. That said, there are a few guide-lines to follow to keep things from getting too out of hand:

  • If you intend to browse the directory structure from Windows Explorer for any reason, keep it under 10,000 entries in a directory (files & sub-directories).
  • If you will be interacting with it solely from cli utilities or code the 10K limit is far more flexible.
  • Don't create TOO many sub-directories, each directory you create is another discrete operation a copy has to make when copying.
    • If each file creates N directories, the number of file-system objects created by that file will be 1+N, which linearly scales your copy-times.
    • A short, exponential tree (i.e. three tiers of directories, each with 256 sub-directories) can scale amazingly far before you run into the 10K/per-directory limit.
  • If you're accessing it with code, go for direct opens instead of parsing directory-listings prior to open. A failed fopen() followed by a directory-scan is faster than a dir-scan followed by a guaranteed fopen() in many cases.

Caveats:

  • File-count is immutable, but directory count is up to you. The SUM of those two counts impacts how fast copy operations take.
  • Try, if at all possible, to not browse with Windows Explorer unless you have to. It doesn't deal well with big directories, and there isn't much you can do about it.
sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • Appreciate your comments. We have some 50 ppl used to Windows only accessing the images. If not Windows Explorer - are there any alternatives? Norton Commander? :) – user1603240 Aug 24 '12 at 12:37
2

There's plenty of good information on the math in my answer from How does directory complexity influences on i-nodes?

With that said, different filesystem handle large numbers of files in directories in various ways. Some are OK with 10,000 entries, others buckle. As a quickly invented rule of thumb, 1,000 is probably a good target cap if you have design control. Entries in a directory are usually stored as some kind of list and it is up to the reading application to sort their order. For example, ls in the Unix world reads things into memory from directory order and then prints them out in alphabetical order.

Take a look at the math from the other question. Also consider what sysadmin1338 said about Explorer behaving differently. Explorer will create thumbnails of anything it recognizes as an image and then read the thumbnails to display them. That's a lot of disk IO to look at a directory that's chock full of files.

Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
  • Absolutely - all overhead processing will be shut off (already is). But - how do you treat the 10,000 entries cap? Is folder X with 10 subfolders having 100 subfolders in them being treated as X having 10,000 entries? Or do you only count the first sublevel of items? If so, then my suggested image structure 4/8/6/1/7/48617/48617-c.jpg will not work as folder 8 will have more than 10,000 entries in it (be it deep down but still). – user1603240 Aug 24 '12 at 12:42
1

Depending on whether you have the resources to develop such a system, this sounds like a good candidate for a SQL Server database using FILESTREAM storage for the files. That way, you leave the organization of the directories to SQL Server and all you have to worry about is how you manage the data itself. You could probably use SQL Express since FILESTREAM data isn't taken into account when calculating the database size.

Chris McKeown
  • 7,128
  • 1
  • 17
  • 25