4
2
I have a set of files that I wish to be allocated contiguously in the filesystem. I will be accessing all these files sequentially in a single read one after the other and I want to optimize reading them. I can't combine them into a single file , and I am looking for a solution which will allow them to be individual files.
I am using an ext4 file-system and I was wondering if there is some existing tool which might do this for me as I learnt that ext4 supports online block exchange and defragmentation. I tried using e4defrag
on my directory of files , although it ensured each individual file was defragmented, each file itself was in a separate block not necessarily adjacent to the other files. (I used filefrag -v file_name
to verify if they were being allocated next to each other or not)
EDIT: Just to clarify on file access patterns, These files will be written exactly once and never modified again. They will be read frequently, but in such a manner that if any one of them is read all the other files in the set will also likely be needed to be read. What I intend to do is prefetch all these files together into the filesystem buffer/cache in one go, so that subsequent random reads of any of these files will be really fast. (The total file size is small enough 100~200MB to fit in cache). What I am trying to improve right now is the read performance when I try to load these files into cache. Right now read performance takes a hit while trying to prefetch them into cache because there are multiple disk seeks as they are located in disjoint segments.
I'm not sure what @phininity's purposes are, but I came here because I'm hoping to put all my torrent files physically side-by-side, sorted to match the logical layout of the torrent chunks; I want to avoid seeking overhead+wear when seeding torrents. Being able to read at least a chunk straight through would be great. – JamesTheAwesomeDude – 2016-08-09T19:12:23.290
(Old topic, I know.) Can you be more specific with some metrics, like the number of files, measured time to read them with current setup, expected time after optimization? If the filesystem is not near-full, there should not be a big difference. Also, what context requires such an optimization scheme? Those hints are important to provide actual help and make sure we're not in a case of [http://xyproblem.info/](XY problem). – Stéphane Gourichon – 2017-07-18T16:11:45.107
IIRC , I was trying to optimize boot-up time for a custom linux setup I had. Basically the goal was to prefetch all required system files into RAM to minimize time wasted in IO. I tested my hypothesis by measuring boot-time break-downs with files first put up on a ramdisk which showed significant improvements. However getting them on the ram disk was still slow. I've unfortunately lost the metrics for time/number of files as I've since moved on to different hardware and an SSD which obsoleted this goal for me.
I'm still curious though, if it's easy to re-organize filesystem blocks. – phininity – 2017-07-19T18:15:44.537
Would you elaborate your scenario a bit more? E.g., are these files written once and then read once? ... Read/write a lot, and often? Do you wish to enhance read performance or read/write performance? How does "random" file placement prevent achieving the performance requirements? – rickhg12hs – 2013-11-11T14:24:57.227
@rickhg12hs The files are static and won't change so write performance need not be considered. I'm only concerned about improving read performance and infact I will always attempt to prefetch the entire set of files to transparently improve performance of several other processes that will immediately use several of them. – phininity – 2013-11-12T05:49:06.000
Easiest way I can think of off the top of my head is to just create a separate partition that is just big enough for the files. As an added bonus, if you create the partition at the beginning of the disk, it'll be even faster. – Lawrence – 2013-11-12T07:51:07.740
1Just wondering ... is this premature optimization? Could writing/duping these files to a ramdisk help? Or your prefetching will work great if it's completed before the data is needed. – rickhg12hs – 2013-11-12T09:56:19.720