0

Possible Duplicate:
Linux's best filesystem to work with 10000's of files without overloading the system I/O

I have an 240gb image store of approximately 1.5 million entries. About half of these entries are image files (4 to 100kb) and the other half of these entries are deeply nested directories. Approximately half of these images are duplicates and have since been made into hard links to each other.

I'm in the process of pulling down a backup of this filesystem and putting it on a local test server with the intention to drastically modify the directory layout and test the changes.

Normally, I would just set up the file system that these images live on as ext4 (don't screw the default unless you have to), but I am wondering if there is a better option for this particular use case.

I have already researched XFS, ext3, ext4, and btrfs, but haven't found any solid benchmark demonstrating that I should pick one over the other for this particular task.

I am also limited by the kernel available by default on Ubuntu 10.04, but will recompile if the reason is good enough.

Michael Pearson
  • 145
  • 3
  • 8

1 Answers1

0

In general, performance should be the last selection criteria for choosing a filesystem. Things to non-exhaustibly consider are:

  • how stable is the filesystem
  • does it support journalling / crash consistency
  • how well does it handle lots of files in a directoy
  • how long does it take to fsck (or does it even need fscking)
  • does it support xattrs
  • does it play well with NFS, etc.
  • can the filesystem be resized, if so, can it be done online
  • does it support tail packing
  • can it be adjusted to play nicely with an underlying raid controller

I use ext4 and xfs heavily in my environment which is fairly heavy on data. I use ext4 for just about all filesystems that I know will never need to grow beyond 16TB (at present, ext4 in the mainline kernel supports filesystems > 16TB) but the utilities can't actually create them). ext3's fsck times get excessive for very large filesystems. xfs is very stable these days but has had memory leaks in the past and silent corruption issues. I used to use it in place of ext3 for fsck times and > 8TB filesystem creation (since fixed when the -F flag was introduced to mkfs.ext3) but was still having issues with it as recently as ~2.6.28. I will however note that I haven't had any confirmed issues with it on rhel/centos patched 2.6.18. I still feel that ext4 is a little more solid and has an upgrade path to btrfs but xfs is definitely stable now. I have more faith in the ext4 fscker.

As for performance, you must configure both to match the stripe size/stride of the underlying device. Ext4 lets you set the iopriority of the journal (journal_ioprio) which may need to be played with along with the journaling mode. xfs performs better than ext4 with less tuning but they are pretty close after proper setup. You really need to setup the stripe/stride with ext4. xfs does better for lots of small files as it can pack them completely into an inode to save space and is noticibly better for certain operations, like deleting large directory trees.

My advice is, unless you need > 16TB support) to try ext4 created with lots of one inode for every block (-i 4096) and to set it up for your raid device. Fully populate it with your data and then see how long it takes to fsck and do common operations. Only convert to xfs if you have a specific issue with ext4.

Joshua Hoblitt
  • 665
  • 4
  • 11