Can I increase app performance by purposefully fragmenting its data on the hard drive?

1

Some popular apps and games have very many files that are not frequently used in normal operation, so there are large distances over which the hard drive needs to leap over to get to the next bits it needs.

My understanding of file fragmentation is when files are scattered in pieces throughout the hard drive and seeking/reading times increase because of that. Defragmentation or consolidation of those files is a common way to improve the hard drive's performance to base level.

But can the parts that aren't often (or at all) used be moved somewhere away, and the most used portions be placed next to each other consequentially in the order the app reads them most of the time to improve its performance?

I understand this problem might be solved by hard drives which have shorter access time to very different locations like Solid State Drives, but aren't they also affected by that somewhat still?

Would it make sense practically to move around application data like this to further reduce disk time as opposed to perfectly defragmented state?

user1306322

Posted 2015-10-23T21:02:44.203

Reputation: 4 237

No; purposefully fragmenting the data would make it slower not faster. Why don't you just remove the fragmentation on the used files? Most software designed to run a defragmentation routine allow you to select which files it will be ran on. – Ramhound – 2015-10-23T21:08:46.113

3This is an incredibly rare case where I disagree with @Ramhound. As I write my answer, I face the possibility that I may be about to look stupid. – ChrisInEdmonton – 2015-10-23T21:17:02.167

Wouldn't disk cache solve the problem quite easily? – some user – 2015-10-23T21:24:29.273

@Ramhound I strongly suspect that defragmentation is the to-go way to solve this only because nobody cared to measure precisely which data is used most often to optimize it the way I describe. And it's understandable because it may vary between systems and of course usage, in addition to being a pretty tricky algo. Still, I'd like to know if there would be any additional overheads I might not have thought of. – user1306322 – 2015-10-23T21:24:51.307

@user1306322 - If you remove the fragmentation from those files that are not used often and are not updated that often, then they will remain in that state, only files that are updated or changed become fragmented. – Ramhound – 2015-10-23T21:39:02.533

Turns out, my 'disagreement' with @Ramhound is purely pedantic. Practically speaking, we are both taking the same position I think. :) – ChrisInEdmonton – 2015-10-23T21:43:21.317

Answers

4

Okay, let's take the easy question here. SSD's are particularly excellent at random reads. So long as you are reading a full block, it doesn't matter if you are doing so to a block immediately before or 'halfway across the disk'. It makes no practical difference. In fact, you can't even tell. Your operating system may think it is storing data in sequential locations on the SSD, but the SSD itself may well map them to opposite sides of the 'storage'. So, on SSDs, you practically don't need to worry about defragmentation of files. Except. I'll get back to this.

Okay, so let's return to regular mechanical disks. These are much faster at sequential reads than they are at random reads. Much, much faster.

Now, if I'm playing Half Life 3 and it has to load data files, my computer is going to have a much easier time if those data files are defragmented and stored in close proximity. Roughly (very roughly), defragmenting converts random reads to sequential reads.

You are granting that it's obviously faster to have the map data file and the character data file defragmented and right next to each other, than having the map data file scattered all over the disk and the character data file also scattered all over the disk.

But... you posit a rather strange scenario. You are suggesting, for example, that Half Life 3 needs to load PART of the map data file and PART of the character data file. For example, it only needs the first 10% of the map data file and the middle 10% of the character data file.

In that case, the optimum way to store the data would be to store the first 10% of the map data followed immediately by the middle 10% of the character data. As you aren't loading anything else (in this contrived example), it doesn't matter where anything else is.

So yes. In this specific case, it'd be helpful if the data was fragmented.

Now, returning to the SSD. It turns out that SSDs have to read and write a page at a time. The exact page size depends on the SSD, but may be 2 KB, 4 KB, 8 KB, 16 KB, or some other size. My point here is that, if the SSD page size is 16 KB and that's the minimum size you can load, a situation where the 10% of the map data and the 10% of the character data that you need are both in the same block, well, that's going to be faster. It's faster to load one block than to load two.

So. Yes, there are some circumstances where purposely fragmenting the data speeds up your access. But it's hard to imagine why you'd ever try to optimise for this case. Indeed, most of the time, you want to load all of a file, not just the first 10%. And modern operating systems cache files anyway, so there's a decent chance when you move from one map location to another in Half Life 3, the map data is already in the file system cache and you don't have to actually load anything from disk anyway.

One interesting option is the SSHD's, the hybrid drives. These are (practically) combinations of mechanical drives with an SSD cache. How is that relevant here? Well, roughly speaking, hybrid drives will move frequently-accessed content into a faster storage area, moving the data from your spinny disk to the SSD part. If you always loaded the first 10% of the map data and the middle 10% of the character data and never loaded anything else, and if the hybrid drive's algorithm was good, that data would end up on the faster SSD part. To some extent, then, this accomplishes what you are setting out to do. Note that the regular file system cache accomplishes the same thing, only the effect is probably faster but only lasts until you reboot.

TL;DNR: Yeah. But seriously, it's all but guaranteed to be a worthless optimisation.

ChrisInEdmonton

Posted 2015-10-23T21:02:44.203

Reputation: 8 110

I'm talking about cases where you don't use a significant portion of often accessed data like in video games' case low (or high) resolution textures. They take up space and are packed in a weird order in a single large archive which wasn't designed to minimize reading times. In any case, are you saying there can be no significant read time reduction from moving data blocks around except combining them together in a smaller number of blocks? – user1306322 – 2015-10-23T21:35:19.403

1On an SSD, I'm saying there's not likely to be any measurable benefit. On a mechanical disk, I'm saying there may be a slight benefit, but honestly you shouldn't bother. – ChrisInEdmonton – 2015-10-23T21:37:38.743

FYI, it'd be relatively easy to write artificial benchmarks to test this sort of thing, if you are a programmer. I'm not going to do so, but you might find it interesting. – ChrisInEdmonton – 2015-10-23T21:38:36.200

1If you're loading a portion from a game data file, it is likely that there would be some sort of index, which points to various logical locations within the file which you might want to read. Sort of like a miniature file system, just like the file system that file resides in. – a CVn – 2015-10-23T21:41:19.070

1Sure; If we had control over that this would work. We don't hence my response of "just remove the fragmentation from all files" which you can actual perform. Yes; You make valid points. – Ramhound – 2015-10-23T21:41:36.723

@ChrisInEdmonton SSDs are so fast that even very small improvements in overhead are measureable. Disks are so slow that small differences in access efficiency are swamped by the cost of moving the head and waiting for the disk to spin. – David Schwartz – 2015-10-23T23:59:32.307

0

Lets address how fragmentation occurs in the first place.

You write small files to the harddisk. The data is written at the first available spot that is free. Lets assume no data was deleted, only written. The data would show up as follows: [##########-----------------] where # are used clusters and - are free clusters.

Now, lets assume that somehwere in these clusters, you have a file that is being deleted. Suddenly your graph looks like this:
[#####-####-----------------]

Now this of course happens a lot, so the graph can look like this at some point:
[##-##-#-##-#--##-#--#------]

Depending on the filesystem (FAT32 writs to the first available empty space, NTFS is much smarter in this and tries to keep files unfragmented where it can) at some point, a large file cannot be stored in just one empty segment. That file will then be spread across the harddrive, example:
[##F##F#F##-#--##-#--#------] where F is that file.

NTFS already will try to store files in close proximity of itself and attempts to combat fragmentation where it can.

But it all comes to this, it doesn't matter if you fragment files on purpose, even if those files are not used often, they'll always present the possibility that fragmentation occurs at any point of time. I guess if it were possible to move files that aren't being used to the end of the disk it may be quicker to defragment the files that are used often (which might actually what NTFS already does) but other than that, the answer to your question is simple: no. fragmenting files on purpose won't help. You can even store files defragmented at the end of the drive and they'd still be defragmented.

LPChip

Posted 2015-10-23T21:02:44.203

Reputation: 42 190

0

"Fragmenting" no. "Arranging files for efficiency" Maaaybe though I doubt it works. Your plan is based on a few premises.

  1. The OS dosen't cache. Which it typically will. Windows does. Commonly stored files cached in ram are going to be an order of magnitude faster.
  2. You can deterministically place files on certain parts of a disk. There's tools that do that using the defragmentation API - jkdefrag/mydefrag did that with impressive results. For a while. It was a handy parlor trick to get a system that was unusably slow somwaht usable.
  3. Your drive has a certain (small) amount of internal ram to smooth things like this over.
  4. You assume your files are stored in static locations and/or the overhead of keeping these files arranged for optimised seek time is low. No one seems to do it this way.

Journeyman Geek

Posted 2015-10-23T21:02:44.203

Reputation: 119 122