We're preparing a cache storage server for some small files that will be cached for a few days (so HDD will have more reading then writing). All files are rather small around 100 to 500KB a file, but we have a lot of them so we can fill 12TB fully with it. The server has a 1Gbit connection and i hoped we could use it fully, as we've 4 HDD's, the read speed should be 250Mb/s (31.25MB/s)

The server runs on Ubuntu Server 14.04LTS

I want to know what people suggest:

  • What Filesystem should we use?
  • Should we combine the HDD's to one big directory?
  • Should the files all be placed in the same directory? (we're talking about 25.000.000 files or so)
  • 79
  • 5

1 Answers1


the read speed should be 250Mb/s (31.25MB/s)

First off, it is very unlikely you will achieve this performance level with 4 7200 rpm HDDs with a random read access pattern. Even if your disks will be able to read larger block sizes ~ 16-64 KB, the maximum for I/O operations per second for a 7.2k disk is ~100 for non-sequential access. In my experience, you might be looking at ~10-20 MB/s in the end, if you've done everything right.

Should we combine the HDD's to one big directory?

You clearly will benefit from striping as offered by md, dmraid or hardware RAID controllers for RAID0. Note that in this operation mode you will lose all data once even a single disk fails.

If you have read and write requests in parallel (even if the percentage of writes is rather low compared to reads), you will benefit from a RAID controller's write-back functionality. Consider buying a controller with a BBU for better operational consistency (controllers without a BBU will lose the content of their cache in case of a power failure and might tear your file system apart).

Should the files all be placed in the same directory? (we're talking about 25,000,000 files or so)

Certainly not. Many file systems perform poorly with a large number of files (>50,000) in a single directory. Avoid this condition for portability reasons. If you absolutely must, take a look at filesystems which are known to perform well under these conditions.

What Filesystem should we use?

It depends. Test your load with the modern bunch and see if you get at inefficiencies. You also likely find yourself looking for tuneables and tweeks to reduce the number of disk seeks per file access for each of them (like mounting with noatime).

In the end, you might want to double the number of disks to use in order to increase performance and introduce redundancy in a RAID10 setup.

  • 40,319
  • 13
  • 105
  • 169
  • THX, I get back on this soon, there is so much stuff to read about this. I think sofar we go for XSF and tune our software to make the load on the hdd more sequential (for example place files we might expect next to each other on the disk and read the 5 or 10 next files in memory when the first file is requested). Only not sure yet how to move files next to each other on a disk :) – klaasio Jul 09 '14 at 15:54