The requirements and constraints:
- 50:50 read:write ratio
- Files being written will range from way larger than the block size to vastly larger than the block size.
- Individual requests will range from 128KB to 4MB
- On Linux
- The file-system will be pretty large, at 14TB.
Unknowns that would help:
- Whether or not the random I/O is within files, or is purely based on whole files being read and written in 128KB-4MB chunks
- The frequency of file updates.
- Concurrency: The frequency of parallel read/write operations (I/O ops).
Sequential I/O
If the 50:50 ratio is represented by reading and writing whole files, and pretty big files at that, then your access patterns are more sequential than random as far as a filesystem is concerned. Use an extent-based filesystem to increase sequentiality in your filesystem for best performance. Since the files are so large, read-ahead will provide significant performance boosts if supported by hardware (some RAID controllers provide this).
Random I/O
This changes if you're planning on doing the read/write activities simultaneously, at which point it does become significantly random. The same applies if you're holding a large number of files open and reading/writing small portions within those files as if it were a database.
One of the biggest misconceptions I run into is the idea that a defragged filesystem performs better than a fragmented one when handling highly random I/O. This is only true in filesystems where the metadata operations suffer greatly on a fragmented filesystem. For very high levels of fragmentation extent-based filesystems can actually suffer more performance degradation than other styles of block management.
That said, this problem only becomes apparent when the I/O access patterns and rate are pushing the disks to their maximum capabilities. With 14TB in the filesystem that means between 7 and 50 spindles in the actual storage array, which yields a vast range of capabilities; ranging from 630 I/O Ops for 7x 2TB 7.2K RPM drives to 9000 I/O Ops for 50x 300GB 15K RPM drives. The 7.2K RPM RAID array will hit I/O saturation a lot faster than the 15K RPM RAID array would.
If your I/O operations rate is not pushing your storage limits, the choice of file-system should be based more on overall management flexibility than tweaking the last few percentage points of performance.
However, if your I/O actually IS running your storage flat out, that's when the tweaking starts becoming needed.
XFS:
- Mount: Set 'allocsize' to no larger than 65536 (64MB), but do set it high. This improves metadata speed for file accesses.
- Mount: Set 'sunit' to the stripe-size of your RAID array. Can also be set at format time.
- Mount: Set 'swidth' to the number of drives in your RAID array (or N-1 for R5, N-2 for R6). Can also be set at format time.
- Format: If you really need that last percentage point, put the filesystem log on a completely separate storage device
-l logdev=/dev/sdc3
EXT4:
- Format:
-E stride
set to the number of blocks (either 512b or 4K depending on the drive) on a single disk-stripe in the RAID.
- Format:
-E stripe-width
set as 'swidth' in XFS
- Format: As with XFS the last percentage point of performance can be squeezed out by placing the journal on a completely separate storage device.
-O journal_dev /dev/sdc3/