12

I'm putting together a Linux box that will act as a continuous integration build server; we'll mostly build Java stuff, but I think this question applies to any compiled language.

What filesystem and configuration settings should I use? (For example, I know I won't need atime for this!) The build server will spend a lot of time reading and writing small files, and scanning directories to see which files have been modified.

UPDATE: Data integrity is a low priority in this case; it's just a build machine ... the final artifacts will be zipped up and archived elsewhere. If the filesystem on the build machine gets corrupted and loses all data, we can just wipe and re-image; builds will continue running as before.

Dan Fabulich
  • 477
  • 1
  • 6
  • 17
  • Possible dupe? http://serverfault.com/questions/29193/what-is-the-best-linux-filesystem-for-mysql-innodb – gravyface Feb 04 '11 at 00:09
  • Do read the link gravyface gave, but also be sure to set aside the partition you are going to do your builds in, you can then test the answers you get here. If you have the money, see if you can forgo using disks ( using a ramdisk, or tmpfs http://www.cyberciti.biz/faq/howto-create-linux-ram-disk-filesystem/ ) – becomingwisest Feb 04 '11 at 00:50

6 Answers6

7

Fastest filesystem? tmpfs mounted out of available RAM, with noatime set.

This is only viable if you have a procedure for checking out everything needed to build your source tree (since the contents of a tmpfs filesystem will go away when you reboot), and if source and objects fit into a reasonable corner of your available RAM (with enough left over to run your compiler & linker without swapping). That said you cant beat working out of RAM for speed..

voretaq7
  • 79,345
  • 17
  • 128
  • 213
7

Use ext4fs as the base filesystem with a few speedup options like

noatime,data=writeback,nobh,barrier=0,commit=300

Then union mount a tmpfs ramdisk on top of that so that files written during the builds get the benefits of the ramdisk. Either change the build procedure to move the resulting binaries off the tmpfs at the end of the build, or merge the tmpfs back into the ext4fs before unmounting.

Michael Dillon
  • 1,809
  • 13
  • 16
  • While it is faster it's worth noting: `barrier=0`, From the arch wiki: *"Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss."* – ideasman42 Sep 14 '17 at 03:54
4

To the answer of Michael Dillon i can add that you can create ext4 filesystem with few options :

mkfs.ext4 -O dir_index,extent -i 8096 /dev/<disk>


dir_index
    Use hashed b-trees to speed up lookups in large directories.

extent 
    Instead of using the indirect block scheme for storing the location of data blocks in an inode, use extents instead.  This is a  much  more  efficient  encoding  which  speeds  up filesystem access, especially for large files.

-i 8096 gives you more inodes per size, useful because building environments create a lot of files.

insider
  • 201
  • 1
  • 7
0

For sources it'd preferable to have compression-on-fly support, which is Reiser4 or Btrfs. Both are "not for production" yet, although I have heard of people using both FSes heavily and happily. :-)

The next choice (I usually do) is Reiser3, not Ext3. Ext3 can be a bit faster nowadays, but Reiser3 doesn't have i-nodes format-time limits, supports on-line changing of "data=" option. It has "tail" support allowing tighter tiny files packing, but if you're concerned about speed, "notail" it.

Both XFS and JFS would be a pain for "lots of small files" case, specially if you'd need rm'ing them.

(Forgotten to mention EXT4: Yeah, it's even faster, then EXT3. But all the above-mentioned EXT3's limitations are EXT4's too).

poige
  • 9,171
  • 2
  • 24
  • 50
0

The operations you describe give some key hints as to what the ideal file-system needs to be able to do:

  • Massively random r/w accesses during the build process.
  • Many, many files getting updated in short order, so fast meta-data operations are critical.
  • Efficient handling of many small files on possibly very file-heavy file-systems.
  • Mature enough not to risk data-loss in infrequent and obscure edge-cases.

Btrfs and Ext4 are three of the above, and the fourth is questionable. Ext4 is probably mature enough for that, but btrfs isn't done baking yet. noatime helps make the meta-data operations more efficient, but when you're creating a bunch of new files, you still need meta-data ops to be screamingly fast.

That's when underlying storage starts becoming a factor. XFS meta-data operations tend to concentrate in a few blocks, which can strain operations. The Ext-style filesystems are better about getting the meta-data closer to the data its describing. However, if your storage is sufficiently abstract (you're running in a VPS, or attached to a SAN) it doesn't matter significantly.

Each filesystem has little speedups that can be done to eek out a few more percentage points. How performant the underlying storage is will greatly impact how much gain you'll see.

In storage parlance, if you have enough I/O Operation overhead in your storage, filesystem inefficiencies start to not matter so much. If you use a SSD for your build partition, filesystem choice is less important than what you're more comfortable working with.

sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
  • I actually DON'T care about data loss that much. (Updated the question to clarify.) I mean, data loss isn't a good thing, but I'm not storing critical data; I'm processing lots of files and moving the data elsewhere. If I could afford the RAM, I'd just use tmpfs as voretaq7 recommended above. – Dan Fabulich Feb 05 '11 at 00:36
0

For lots of small files, I'd recommend Reiser over ext3, xfs, jfs..., although I've heard that ext4 is a lot better (i.e. opposite of what poise says) than its previous incarnations for this pattern of access.

Reiser pushes a lot of the files structure up the inode tree - so it works really well when dealing with small files.

However the differences in behaviour between the leading filesystems is relatively small compared to the benefits you'll get by having enough physical memory to cache/buffer effectively.

and scanning directories to see which files have been modified.

This is a crappy way to solve the problem - even though its relatively simple. If it is that important, think about writing an inotify handler to index the mods.

OTOH, if you're using flash SSD (which will give you very low seek times) I'd recommend using a fs which distributes write more effectively for longevity reasons - e.g. JFFS2

symcbean
  • 19,931
  • 1
  • 29
  • 49