2

I'm using a directory as a staging area for my files before shipping them to Amazon S3 buckets. This staging directory has no sub-directory structure /a/b/c or /year/month/day ... it's only files /cdn/file1.png /cdn/file2.png etc..

I have about 64,000 files in that 1 directory and its size is 2.8GB now.

My question is, will that break? I'm aware that it is not optimal and i'm working in parallel to fix this issue but that might take time to migrate.

I'm expecting to keep it this way for another year, which means approx a total of 400,000 files inside one directory.

thoughts? thanks.

Haytham Elkhoja
  • 221
  • 1
  • 6

2 Answers2

1

It will work. You may not want to run any batch operations on the directory's contents. ls and such will drag. I tend to use XFS filesystems for directories that have a large number of files that aren't stored in a tree...

For instance...

# mount
/dev/sdb1 on /app type xfs (rw,noatime,logbufs=8,logbsize=256k,nobarrier)

[root@Rizzo /app/prt]# ls -1 | wc -l
191487

[root@Rizzo /app/prt]# time du -skh .
27G     .

real    0m0.834s
user    0m0.236s
sys     0m0.566s

[root@Rizzo /app/prt]# time ls -lrta | tail -8
-rw-rw-rw-  1 PAB      PAB             733 Dec 15 11:48 09228885.TGZ
-rw-rw-rw-  1 PJD      PJD            8250 Dec 15 11:48 09228881.TGZ
-rw-rw-rw-  1 PJD      PJD            9803 Dec 15 11:48 09228881.LAY.TGZ
-rw-rw-rw-  1 PJD      PJD          127973 Dec 15 11:49 09228886.LAY
-rw-rw-rw-  1 PJD      PJD           31720 Dec 15 11:49 09228886.PRT
-rw-rw-rw-  1 PJD      PJD            5368 Dec 15 11:49 09228886.POF
drwxrwxrwx  3 PEB      SJS         5066752 Dec 15 11:49 .
-rw-rw-rw-  1 PJD      PJD           31726 Dec 15 11:49 09228886.TMP

real    0m2.673s
user    0m1.055s
sys     0m1.622s
ewwhite
  • 194,921
  • 91
  • 434
  • 799
0

I have learned the advantages of ZFS when dealing with a high number of files on a filesystem. To mimic ewwhite's benchmarks:

# ls -1 | wc -l
[...]
500982


# time du -skh .
303G    .

real    0m42.422s
user    0m3.889s
sys     0m25.546s

# time ls -lrta | tail -0
real    0m21.053s
user    0m5.503s
sys     0m15.496s

This is on a Solaris machine with a 6-disk RAID10 SATA array and 4 GB of RAM, so nothing fancy. The directory is exported via NFS to Linux machines using it. I don't know if the FUSE ZFS implementation is going to show similar performance

The main reason for us for not using XFS is because we don't have any experience with it, but as ewwhite's numbers show, it might be quite a decent choice.

the-wabbit
  • 40,319
  • 13
  • 105
  • 169