Contiguous physical allocation of a set of files in linux filesystem (ext4)

4

2

I have a set of files that I wish to be allocated contiguously in the filesystem. I will be accessing all these files sequentially in a single read one after the other and I want to optimize reading them. I can't combine them into a single file , and I am looking for a solution which will allow them to be individual files.

I am using an ext4 file-system and I was wondering if there is some existing tool which might do this for me as I learnt that ext4 supports online block exchange and defragmentation. I tried using e4defrag on my directory of files , although it ensured each individual file was defragmented, each file itself was in a separate block not necessarily adjacent to the other files. (I used filefrag -v file_name to verify if they were being allocated next to each other or not)

EDIT: Just to clarify on file access patterns, These files will be written exactly once and never modified again. They will be read frequently, but in such a manner that if any one of them is read all the other files in the set will also likely be needed to be read. What I intend to do is prefetch all these files together into the filesystem buffer/cache in one go, so that subsequent random reads of any of these files will be really fast. (The total file size is small enough 100~200MB to fit in cache). What I am trying to improve right now is the read performance when I try to load these files into cache. Right now read performance takes a hit while trying to prefetch them into cache because there are multiple disk seeks as they are located in disjoint segments.

phininity

Posted 2013-11-11T05:56:20.913

Reputation: 161

I'm not sure what @phininity's purposes are, but I came here because I'm hoping to put all my torrent files physically side-by-side, sorted to match the logical layout of the torrent chunks; I want to avoid seeking overhead+wear when seeding torrents. Being able to read at least a chunk straight through would be great. – JamesTheAwesomeDude – 2016-08-09T19:12:23.290

(Old topic, I know.) Can you be more specific with some metrics, like the number of files, measured time to read them with current setup, expected time after optimization? If the filesystem is not near-full, there should not be a big difference. Also, what context requires such an optimization scheme? Those hints are important to provide actual help and make sure we're not in a case of [http://xyproblem.info/](XY problem). – Stéphane Gourichon – 2017-07-18T16:11:45.107

IIRC , I was trying to optimize boot-up time for a custom linux setup I had. Basically the goal was to prefetch all required system files into RAM to minimize time wasted in IO. I tested my hypothesis by measuring boot-time break-downs with files first put up on a ramdisk which showed significant improvements. However getting them on the ram disk was still slow. I've unfortunately lost the metrics for time/number of files as I've since moved on to different hardware and an SSD which obsoleted this goal for me.

I'm still curious though, if it's easy to re-organize filesystem blocks. – phininity – 2017-07-19T18:15:44.537

Would you elaborate your scenario a bit more? E.g., are these files written once and then read once? ... Read/write a lot, and often? Do you wish to enhance read performance or read/write performance? How does "random" file placement prevent achieving the performance requirements? – rickhg12hs – 2013-11-11T14:24:57.227

@rickhg12hs The files are static and won't change so write performance need not be considered. I'm only concerned about improving read performance and infact I will always attempt to prefetch the entire set of files to transparently improve performance of several other processes that will immediately use several of them. – phininity – 2013-11-12T05:49:06.000

Easiest way I can think of off the top of my head is to just create a separate partition that is just big enough for the files. As an added bonus, if you create the partition at the beginning of the disk, it'll be even faster. – Lawrence – 2013-11-12T07:51:07.740

1Just wondering ... is this premature optimization? Could writing/duping these files to a ramdisk help? Or your prefetching will work great if it's completed before the data is needed. – rickhg12hs – 2013-11-12T09:56:19.720

Answers

0

Not exactly a way to re-organize filesystem blocks, but…

You want to have the files in RAM and you said in comments you had already experimented with ramdisk. We can improve this approach.

My idea is to read files not directly from a filesystem like ext4 but from a .tar file. You would create this file once, place it on ext4 filesystem and defragment with e4defrag. Then at every boot

cd /mnt/target_tmpfs/ &&  tar -xf /mnt/ext4/defragmented_archive.tar

I don't think tar will look around in the given file, seek back and forth. But if you do think so then you can always use cat … | tar -x. In this case tar can only read its input in a continuous manner and at most once.

I'm aware you cannot easily load entire OS this way, unless maybe you prepare initramfs to do it. I don't know much about it but I've found this: Custom Initramfs. From therein:

Here are some examples of what you can do with initramfs:

  • Mount the root partition (for encrypted, logical, and otherwise special partitions);
  • […]

See example scripts there. Mounting ext4, then tmpfs and populating it from .tar, then using this as / – it all seems possible in general.

Obviously you would like your custom-initramfs.cpio.gz to be defragmented as well as the .tar file.

Kamil Maciorowski

Posted 2013-11-11T05:56:20.913

Reputation: 38 429