Why does emptying disk space speed up computers?

191

89

I have been looking at a bunch of videos and now understand a bit better how computers work. I better understand what RAM is, volatile and non-volatile memory, and the process of swapping. I also understand why increasing RAM speeds up a computer.

I don't understand why cleaning up disk space speeds up a computer. Does it? Why does it? Does it have to do with searching for available space to save things? Or with moving things around to make a long enough continuous space to save something? How much empty space on the hard disk should I leave free?

Remi.b

Posted 2015-04-19T23:06:35.027

Reputation: 2 431

37it doesn't really speed up PCs, it only reduces the chances of file fragmentations which make HDDs slower. This is one of the greatest PC myths that everyone repeats. To find bootlenecks on the PC, trace it with xperf/WPA. – magicandre1981 – 2015-04-20T04:25:21.570

9FWIW it speeds up the experience of using a PC. – edthethird – 2015-04-20T15:06:02.640

4@magicandre1981: There is a tiny gem of truth. The more things in each folder, the slower file traversal is, which impacts anything using a filepath, which is... everything. But that's tiny. – Mooing Duck – 2015-04-20T18:29:40.487

4

@MooingDuck While true, that's related to the number of files in a folder, not to the size of the files or the amount of space remaining on the drive. That effect is not related to remaining disk space. The effect also is limited in scope to the folder itself, it won't "slow down" the whole computer. Some filesystems, ext3/4 for example, use hashed directory trees to make lookups (including subfolder access) fast, thus limiting the scope of the effect even more, e.g. only when listing contents of a directory.

– Jason C – 2015-04-20T18:57:43.097

4What videos were you watching exactly? – Loko – 2015-04-21T12:08:19.340

3

@Loko I watched this one and another noe on the same theme I can't find anymore. I also watched this one and a bunch of others that are more engineer oriented.

– Remi.b – 2015-04-21T13:26:52.497

Answers

312

Here, I wrote a book by accident. Get some coffee first.

Why does emptying disk space speed up computers?

It doesn't, at least not on its own. This is a really common myth. The reason it is a common myth is because filling up your hard drive often happens at the same time as other things that traditionally could slow down your computer. SSD performance does tend to degrade as the drive fills, but this is a relatively new issue, unique to SSDs, and is not really noticeable for casual users. Generally, low free disk space is just a red herring.

For example, things like:

  • File fragmentation. File fragmentation is an issue††, but lack of free space, while definitely one of many contributing factors, is not the only cause of it. Some key points here:

    • The chances of a file being fragmented are not related to the amount of free space left on the drive. They are related to the size of the largest contiguous block of free space on the drive (e.g. "holes" of free space), which the amount of free space happens to put an upper bound on. They are also related to how the file system handles file allocation (more below). Consider: A drive that is 95% full with all free space in one single contiguous block has 0% chance of fragmenting a new file ††† (and the chance of fragmenting an appended file is independent of the free space). A drive that is 5% full but with data spread evenly over the drive has a very high chance of fragmentation.

    • Keep in mind that file fragmentation only affects performance when the fragmented files are being accessed. Consider: You have a nice, defragmented drive that still has lots of free "holes" in it. A common scenario. Everything is running smoothly. Eventually, though, you get to a point where there are no more large blocks of free space remaining. You download a huge movie, the file ends up being severely fragmented. This will not slow down your computer. All of your application files and such that were previously fine won't suddenly become fragmented. This may make the movie take longer to load (although typical movie bit rates are so low compared to hard drive read rates that it'll most likely be unnoticeable), and it may affect I/O-bound performance while the movie is loading, but other than that, nothing changes.

    • While file fragmentation is certainly an issue, often times the effects are mitigated by OS and hardware level buffering and caching. Delayed writes, read-ahead, strategies like the prefetcher in Windows, etc., all help reduce the effects of fragmentation. You generally don't actually experience significant impact until the fragmentation becomes severe (I'd even venture to say that as long as your swap file isn't fragmented, you'll probably never notice).

  • Search indexing is another example. Let's say you have automatic indexing turned on and an OS that doesn't handle this gracefully. As you save more and more indexable content to your computer (documents and such), indexing may take longer and longer and may start to have an effect on the perceived speed of your computer while it is running, both in I/O and CPU usage. This is not related to free space, it's related to the amount of indexable content you have. However, running out of free space goes hand in hand with storing more content, hence a false connection is drawn.

  • Antivirus software. Similar to the search indexing example. Let's say you have antivirus software set up to do background scanning of your drive. As you have more and more scannable content, the search takes more I/O and CPU resources, possibly interfering with your work. Again, this is related to the amount of scannable content you have. More content often equals less free space, but the lack of free space is not the cause.

  • Installed software. Let's say you have a lot of software installed that loads when your computer boots, thus slowing down start-up times. This slow down happens because lots of software is being loaded. However, installed software takes hard drive space. Therefore hard drive free space decreases at the same time that this happens, and again a false connection can be readily made.

  • Many other examples along those lines which, when taken together, appear to closely associate lack of free space with lower performance.

The above illustrate another reason that this is such a common myth: While lack of free space is not a direct cause of slow down, uninstalling various applications, removing indexed or scanned content, etc. sometimes (but not always; outside the scope of this answer) increases performance again for reasons unrelated to the amount of free space remaining. But this also naturally frees up hard drive space. Therefore, again, an apparent (but false) connection between "more free space" and "faster computer" can be made.

Consider: If you have a machine running slowly due to lots of installed software, etc., and you clone, exactly, your hard drive to a larger hard drive then expand your partitions to gain more free space, the machine won't magically speed up. The same software loads, the same files are still fragmented in the same ways, the same search indexer still runs, nothing changes despite having more free space.

Does it have to do with searching for a memory space where to save things?

No. It does not. There's two very important things worth noting here:

  1. Your hard drive doesn't search around to find places to put things. Your hard drive is stupid. It's nothing. It's a big block of addressed storage that blindly puts things where your OS tells it to and reads whatever is asked of it. Modern drives have sophisticated caching and buffering mechanisms designed around predicting what the OS is going to ask for based on the experience we've gained over time (some drives are even aware of the file system that is on them), but essentially, think of your drive as just a big dumb brick of storage with occasional bonus performance features.

  2. Your operating system does not search for places to put things, either. There is no "searching". Much effort has gone into solving this problem, as it is critical to file system performance. The way that data is actually organized on your drive is determined by your file system. For example, FAT32 (old DOS and Windows PCs), NTFS (later Windows), HFS+ (Mac), ext4 (some Linuxes), and many others. Even the concept of a "file" and a "directory" are merely products of typical file systems -- hard drives know not about the mysterious beasts called "files". Details are outside the scope of this answer. But essentially, all common file systems have ways of tracking where the available space is on a drive so that a search for free space is, under normal circumstances (i.e. file systems in good health), unnecessary. Examples:

    • NTFS has a master file table, which includes the special files $Bitmap, etc., and plenty of meta data describing the drive. Essentially it keeps track of where the next free blocks are, so that new files can be written directly to free blocks without having to scan the drive every time.

    • Another example, ext4 has what's called the "bitmap allocator", an improvement over ext2 and ext3 that basically helps it directly determine where free blocks are instead of scanning the list of free blocks. Ext4 also supports "delayed allocation", that is, buffering of data in RAM by the OS before writing it out to the drive in order to make better decisions about where to put it to reduce fragmentation.

    • Many other examples.

or with moving things around for making up a long enough continuous space for saving something?

No. This does not happen, at least not with any file system I'm aware of. Files just end up fragmented.

The process of "moving things around to make up a long enough contiguous space for saving something" is called defragmenting. This doesn't happen when files are written. This happens when you run your disk defragmenter. On newer Windows, at least, this happens automatically on a schedule, but it is never triggered by writing a file.

Being able to avoid moving things around like this is key to file system performance, and is why fragmentation happens and why defragmentation exists as a separate step.

How much empty space on the hard disk should I leave free?

This is a trickier question to answer, and this answer has already turned into a small book.

Rules of thumb:

  • For all types of drives:

    • Most importantly, leave enough free space for you to use your computer effectively. If you're running out of space to work, you'll want a bigger drive.
    • Many disk defragmentation tools require a minimum amount of free space (I think the one with Windows requires 15% worst case) to work in. They use this free space to temporarily hold fragmented files as other things are rearranged.
    • Leave space for other OS functions. For example, if your machine does not have a lot of physical RAM, and you have virtual memory enabled with a dynamically sized page file, you'll want to leave enough space for the page file's maximum size. Or if you have a laptop that you put into hibernation mode, you'll need enough free space for the hibernation state file. Things like that.
  • SSD-specific:

    • For optimum reliability (and to a lesser extent, performance) SSDs require some free space, which, without going into too much detail, they use for spreading data around the drive to avoid constantly writing to the same place (which wears them out). This concept of leaving free space is called over-provisioning. It's important, but in many SSDs, mandatory over-provisioned space already exists. That is, the drives often have a few dozen more GB than they report to the OS. Lower-end drives often require you to manually leave unpartitioned space, but for drives with mandatory OP, you do not need to leave any free space. An important thing to note here is that over-provisioned space is often only taken from unpartitioned space. So if your partition takes up your entire drive and you leave some free space on it, that doesn't always count. Many times, manual over-provisioning requires you to shrink your partition to be smaller than the size of the drive. Check your SSD's user manual for details. TRIM and garbage collection and such have effects as well but those are outside the scope of this answer.

Personally I usually grab a bigger drive when I have about 20-25% free space remaining. This isn't related to performance, it's just that when I get to that point, I expect that I'll probably be running out of space for data soon, and it's time to get a bigger drive.

More important than watching free space is making sure scheduled defragmentation is enabled where appropriate (not on SSDs), so that you never get to the point where it becomes dire enough to affect you. Equally important is avoiding misguided tweaks and letting your OS do its thing, e.g. don't disable the Windows prefetcher (except for SSDs), etc.


There's one last thing worth mentioning. One of the other answers here mentioned that SATA's half-duplex mode prevents reading and writing at the same time. While true, this is greatly oversimplified and is mostly unrelated to the performance issues being discussed here. What this means, simply, is that data can't be transferred in both directions on the wire at the same time. However, SATA has a fairly complex specification involving tiny maximum block sizes (about 8kB per block on the wire, I think), read and write operation queues, etc., and does not preclude writes to buffers happening while reads are in progress, interleaved operations, etc.

Any blocking that occurs would be due to competing for physical resources, usually mitigated by plenty of cache. The duplex mode of SATA is almost entirely irrelevant here.


"Slow down" is a broad term. Here I use it to refer to things that are either I/O-bound (e.g. if your computer is sitting there crunching numbers, the contents of the hard drive have no impact), or CPU-bound and competing with tangentially related things that have high CPU usage (e.g. antivirus software scanning tons of files).

†† SSDs are affected by fragmentation in that sequential access speeds are generally faster than random access, despite SSDs not facing the same limitations as a mechanical device (even then, lack of fragmentation does not guarantee sequential access, due to wear leveling, etc., as James Snell notes in comments). However, in virtually every general use scenario, this is a non-issue. Performance differences due to fragmentation on SSDs are typically negligible for things like loading applications, booting the computer, etc.

††† Assuming a sane file system that isn't fragmenting files on purpose.

Jason C

Posted 2015-04-19T23:06:35.027

Reputation: 8 273

Very comprehensive answer, thanks. Also thanks for the reminder to grab some coffee, it was much appreciated. – Hashim – 2018-02-27T00:19:46.140

22

In addition to Nathanial Meek's explanation for HDDs, there is a different scenario for SSDs.

SSDs are not sensitive to scattered data because the access time to any place on the SSD is the same. The typical SSD access time is 0.1ms versus a typical HDD access time of 10 to 15ms. It is, however, sensitive to data that is already written on the SSD

Unlike traditional HDDs that can overwrite existing data, a SSD needs completely empty space to write data. That is done by functions called Trim and Garbage Collection which purge data that was marked as deleted. Garbage Collection works best in combination with a certain amount of free space on the SSD. Usually 15% to 25% of free space is recommended.

If the garbage collection cannot complete it's job in time, then each write operation is preceded by a cleanup of the space to where the data is supposed to be written. That doubles the time for each write operation and degrades overall performance.

Here is an excellent article that explains the functioning of Trim and Garbage Collection

whs

Posted 2015-04-19T23:06:35.027

Reputation: 1 251

Note that SSDs CAN write to partially-filled cells, by reading the partial data and writing back with more written, but it tends to only do that when it's unavoidable. This is of course also quite slow, and usually indicates the drive is so badly-fragmented that it'll take quite a lot to make it ever write quickly again. – fluffy – 2015-04-20T03:56:19.283

That will also depend on the controller. And since there are so many variations I did not want to go into that level of detail. – whs – 2015-04-20T04:04:38.143

The 15-25% you speak of is called "over-provisioning". Some drives have mandatory space allocated for this already (e.g. the 1TB EVO 840 has 9% reserved and not reported to the OS as free), for those you don't need to leave any free space. I believe that in some cases the over-provisioned space must be unpartitioned too, and simply leaving free space on your file system doesn't cut it, you'd need to actually leave unallocated space. – Jason C – 2015-04-20T14:54:33.503

Over-provisioning is something else. Those are nands on stand-by to replace defective nands. The 15-25% are required for freeing up blocks (pages) and for wear levelling. You may want to read here for details ==> http://www.thessdreview.com/daily-news/latest-buzz/garbage-collection-and-trim-in-ssds-explained-an-ssd-primer/

– whs – 2015-04-20T16:38:14.010

@whs It is not, and the article you link to does not imply that it is. Over-provisioned space (see also cited sources in that section, or Google) is the pool of free blocks, blocks in this pool are used for garbage collection / fast writes, wear leveling, and replacement of defective cells. As for replacing defective cells, it's all in the same pool; once it's full of defective cells, you start seeing the consistent errors. See also slide 12 in this presentation from LSI; the whole thing is worth going through, it addresses the topic directly.

– Jason C – 2015-04-20T18:17:41.650

Also SSDs do not work in the way you've described. Pages marked by TRIM tell the GC which pages it should ignore, not purge. TRIM is a complement to GC. Non-filesystem-aware GC doesn't purge data that was marked by TRIM, it purges data that was invalidated by its own internal write processes. TRIM is used by the OS to tell the drive what pages are no longer used (among other things), so that GC can ignore those pages instead of assuming they contain valid data. You've confused the two concepts. See that linked presentation, as well as https://www.cindori.org/trim-vs-garbage-collection/

– Jason C – 2015-04-20T18:41:22.943

@JasonC: I think the simplest way to describe GC and the trim command is to say that in GC a page becomes garbage when a new version of that page is written, but the act of writing a new version of that page will consume a formerly-blank page. Trim allows a page to become garbage without requiring that a new version be created. – supercat – 2015-04-21T15:58:26.293

@whs Much better now. – Jason C – 2015-04-21T19:09:32.637

In a couple of short sentences it is difficult to cover the whole story correctly. It has to be vague. – whs – 2015-04-21T19:56:27.803

12

Somewhere inside a traditional hard disk is a spinning metal platter where the individual bits and bytes are actually encoded. As data is added to the platter, the disk controller stores it on the outside of the disk first. As new data is added space is used moving towards the inside of the disk last.

With this in mind, there are two effects that cause disk performance to decrease as the disk fills up: Seek Times and Rotational Velocity.

Seek Times

To access data, a traditional hard disk must physically move a read/write head into the correct position. This takes time, called the "seek time". Manufacturers publish the seek times for their disks, and it's typically just a few milliseconds. That may not sound like much, but to a computer it's an eternity. If you have to read or write to a lot of different disk locations to complete a task (which is common), those seek times to can add up to noticeable delay or latency.

A drive that is almost empty will have most of it's data in or near the same position, typically at the outer edge near the rest position of the read/write head. This reduces the need to seek across the disk, greatly reducing the time spent seeking. A drive that is almost full will not only need to seek across the disk more often and with larger/longer seek movements, but may have trouble keeping related data in the same sector, further increasing disk seeks. This is called fragmented data.

Freeing disk space can improve seek times by allowing the defragmentation service not only to more quickly clean up fragmented files, but also to move files towards the outside of the disk, so that the average seek time is shorter.

Rotational Velocity

Hard drives spin at a fixed rate (typically 5400rpm or 7200rpm for your computer, and 10000rpm or even 15000 rpm on a server). It also takes a fixed amount of space on the drive (more or less) to store a single bit. For a disk spinning at a fixed rotation rate, the outside of the disk will have a faster linear rate than the inside of the disk. This means bits near the outer edge of the disk move past the read head at a faster rate than bits near the center of the disk, and thus the read/write head can read or write bits faster near the outer edge of the disk than the inner.

A drive that is almost empty will spend most of it's time accessing bits near the faster outer edge of disc. A drive that is almost full will spend more time accessing bits near the slower inner portion of the disc.

Again, emptying disk space can make the computer faster by allowing the defrag service to move data towards the outside of the disk, where reads and writes are faster.

Sometimes a disc will actually move too fast for the read head, and this effect is reduced because sectors near the outer edge will be staggered... written out of order so that the read head can keep up. But overall this holds.

Both of these effects come down to a disk controller grouping data together in the faster part of the disk first, and not using the slower parts of the disk until it has to. As the disk fills up, more and more time is spent in the slower part of the disk.

The effects also apply to new drives. All else being equal, a new 1TB drive is faster than a new 200GB drive, because the 1TB is storing bits closer together and won't fill to the inner tracks as fast. However, attempting to use this to inform purchasing decisions is rarely helpful, as manufactures may use multiple platters to reach the 1TB size, smaller platters to limit a 1TB system to 200GB, software/disk controller restrictions to limit a 1TB platter to only 200GB of space, or sell a drive with partially completed/flawed platters from a 1TB drive with lots of bad sectors as a 200GB drive.

Other Factors

It's worth noting here that the above effects are fairly small. Computer hardware engineers spend a lot of time working on how to minimize these issues, and things like hard drive buffers, Superfetch caching, and other systems all work to minimize the problem. On a healthy system with plenty of free space, you're not likely to even notice. Additionally, SSDs have completely different performance characteristics. However, the effects do exist, and a computer does legitimately get slower as the drive fills up. On an unhealthy system, where disk space is very low, these effects can create a disk thrashing situation, where the disk is constantly seeking back and forth across fragmented data, and freeing up disk space can fix this, resulting in more dramatic and noticeable improvements.

Additionally, adding data to the disk means that certain other operations, like indexing or AV scans and defragmentation processes are just doing more work in the background, even if it's doing it at or near the same speed as before.

Finally, disk performance is huge indicator of overall PC performance these days... an even larger indicator than CPU speed. Even a small drop in disk throughput will very often equate to a real perceived overall drop in PC performance. This is especially true as hard disk performance hasn't really kept pace with CPU and memory improvements; the 7200 RPM disk has been the desktop standard for over a decade now. More than ever, that traditional spinning disk is the bottleneck in your computer.

Joel Coehoorn

Posted 2015-04-19T23:06:35.027

Reputation: 26 787

1Seek time increase is not a result of low free space, it's a result of data organization. Freeing disk space won't decrease seek times if your data is already all over the drive. Similarly, running out of disk space won't suddenly increase seek times for unrelated data that was already well organized. More importantly, be very wary of associating either of these with "a slow computer". For example, you're not going to browse the web faster just because your browser's executable is unfragmented and on the outside of a mechanical drive, and your MP3s will still play smoothly even in worst case. – Jason C – 2015-04-21T19:05:22.260

2@JasonC Each of those points is true in isolation, but taken as a part of the whole system can add up to real slowdowns. An example is this claim: "Freeing disk space won't decrease seek times if your data is already all over the drive." I can't dispute that by itself, but I can point out the defrag service can now move this data towards the front of the drive, and now those things will improve seek times. Other points in your comment have similar counters: running out of disk space won't increase seeks for well organized data, but it does make it less likely that data stays organized. – Joel Coehoorn – 2015-04-21T20:35:32.750

1@JasonC However, I did add a couple lines to my answer based on your comment, to more directly address the title question. – Joel Coehoorn – 2015-04-21T20:40:41.893

Of course; but my main points are 1) that slow down is a consequence of something else, even though low free space may be one of many factors, and 2) you have to be really careful with this topic, it's one of those ones people latch onto very quickly. If a casual user notices that their computer is slow, in reality it's highly unusual for, say, fragmentation (e.g.) to be the actual cause. But then they read a bunch of stuff on the internet, install ccleaner, 50 disk defragmenters, make a bunch of bad registry tweaks etc. Need to cater to the masses here; the wiser don't need our answers. – Jason C – 2015-04-21T20:51:03.553

This answer hints at short-stroking (artificially limiting HD size to keep data on the faster, outer regions) and some potential problems with it. I also like it as it doesn't deny that in most cases, for most users a drive gets more fragmented as it gets more full. While it's worth noting that free space isn't the actual issue, it's pointlessly obtuse to ignore the general user experience when dealing with a general user experience question. – Smithers – 2015-04-27T17:47:18.883

Rotational velocity is no longer that important because density per inch has increased incredibly. Because of increased density, sectors are not evenly created on the platters any more because the heads cannot possibly write to sectors further out on the platter than it would near the centre. This is a particular problem with the new shingle technology (SMR) The problem we are facing now is that heads are the bottleneck, on SMR 1 sector write actually needs to do allot more read and writes and on the outer tracks is becoming impracticable. It even starts to struggle with reads. – Piotr Kula – 2015-04-30T21:32:19.313

I talk about that in the 4th paragraph of the Rotational Velocity section. – Joel Coehoorn – 2015-05-01T03:56:50.873

6

All of the other answers are technically correct - however I've always found that this simple example explains it best.

Sorting things is really easy if you have lots of space... but difficult if you don't have the space... computers need the space too!

This classic "15 puzzle" is tricky/time consuming because you only have 1 free square to shuffle the tiles around in to get them in the correct 1-15 order.

hard 15 puzzle

However if the space was much bigger, you could solve this puzzle in well under 10 seconds.

easy 15 puzzle

For anyone that has ever played with this puzzle... understanding the analogy seems to come naturally. ;-)

scunliffe

Posted 2015-04-19T23:06:35.027

Reputation: 1 508

2This isn't analogous to any file system behavior though. It somewhat resembles the general process of defragmentation, I guess, although defrag, relative to this puzzle analogy, lets you remove numbers from the board and re-place them anywhere you want as you solve it. – Jason C – 2015-04-23T04:31:01.910

2In addition to Jason's comment above, I want to point out the obvious: this answer relates to sorting (defragmenting), but does not explain why accessing a specific, random tile (say, the "3" tile) would be faster in the second case than in the first case. – a CVn – 2015-04-23T21:24:27.770

Because you're not accessing just "3". You're accessing "1-15". While I admit that's not crystal clear in the example, I took it as understood. Might be worth specifically noting something like, "this puzzle is analogous to a single fragmented file." Great answer, makes things quite mentally accessable! – Smithers – 2015-04-27T17:50:28.903

1To clarify: The primary issue with the analogy here is that, in the actual puzzle, you can only move tiles to adjacent empty spaces. That is, in the tiny example, only 6 or 13 could be moved into the empty space. That's what makes the puzzle challenging; it's the point of the tile game. When defragmenting a hard drive, though, you can move e.g. 4 to the empty space, 1 to it's correct location, and so on, quite easy to solve, in exactly as many moves as the case with lots of space. So the analogy really breaks down, since the crux of the puzzle doesn't apply: No file system works this way. – Jason C – 2015-04-27T22:07:25.233

'Because you're not accessing just "3". You're accessing "1-15"' -- this is nonsense. The answer doesn't explain why accessing a specific random tile would be faster in the second case than in the first case because the answer is completely wrong. Not only isn't defragging limited the way the puzzle is, but "tricky/time consuming" defragging has no bearing on system performance. – Jim Balter – 2017-07-17T19:28:31.860

5

A computer that's had very little disk space, on a spinning mechanical hard drive, for a significant amount of time, will generally become slower as file fragmentation grows. Increased fragmentation means slow reads – very slow in extreme cases.

Once a computer is in this state, freeing disk space will not actually fix the problem. You'd also need to defragment the disk. Before a computer is in this state, freeing the space will not speed it up; it will simply reduce the chances of fragmentation becoming a problem.

This only applies to computers with spinning mechanical hard drives, because fragmentation has a negligible effect on the read speed of SSDs.

RomanSt

Posted 2015-04-19T23:06:35.027

Reputation: 7 830

A good, clean, simple answer that also addresses the main core issue. – Smithers – 2015-04-27T17:47:42.747

4

Flash disks can definitely get slower when they are full or fragmented, though the mechanisms for slowdown are unlike any that would occur with a physical hard drive. A typical flash memory chip will be divided into some number of erase blocks, each of which consists of a large number (hundreds, if not thousands) of write pages, and will support three primary operations:

  1. Read a flash page.
  2. Write to a formerly-blank flash page.
  3. Erase all of the flash pages on a block.

While it would in theory be possible to have each write to a flash drive read all the pages from a block, change one in the buffer, erase the block, and then write the buffer back to the flash device, such an approach would be extremely slow; it would also be likely to cause data loss if power were lost between the time the erase was started and the writeback was completed. Further, frequently-written parts of the disk would wear out extremely quickly. If the first 128 sectors of the FAT were stored in one flash block, for example, the drive would be dead after the total number of writes to all of those sectors reached about 100,000, which isn't very much, especially given that 128 sectors would hold about 16,384 FAT entries.

Because the above approach would work horribly, drive will cause it to identify some blank page, write the data there, and somehow record the fact that the logical sector in question is stored at that location. As long as enough blank pages are available, this operation can proceed quickly. If blank pages get to be in short supply, however, the drive may need to find blocks that contain relatively few "live" pages, move any live pages in those blocks to some of the remaining blank ones, and mark the old copies as "dead"; having done that, the drive will then be able to erase blocks that contain only "dead" pages.

If a drive is only half-full, then there will certainly be at least one block which is at most half full of live pages (and there will quite likely be some blocks that contain few or none). If each block holds 256 pages and the least-full blocks hold 64 live pages (a moderately-bad case), then for every 192 requested sector writes the drive will have to perform 64 additional sector copies and one block erase (so the average cost of each sector write would be about 1.34 page writes and 0.005 block erases). Even in worst case, every 128 sector writes would require 128 additional sector copies and a block erase (average cost per write of 2 page writes and 0.01 block erases)

If a drive is 99% full, and the least-full blocks have 248/256 live pages, then every 8 sector writes will require 248 additional page writes and a block erase, thus yielding a cost per write of 32 page writes and 0.125 block erases--a very severe slowdown.

Depending upon how much "extra" storage a drive has, it may not allow things to get quite that bad. Nonetheless, even at the point where a drive is 75% full the worst-case performance may be more than twice as bad as the worst-case performance when it's 50% full.

supercat

Posted 2015-04-19T23:06:35.027

Reputation: 1 649

3

You pretty much nailed it. You can think of a SATA HDD as a half duplex communications medium (That is, it can only accept or transmit data at a time. Not both.) so when the drive is held up for an extended time looking for a free location to write to, it can't read any data to you. As a rule of thumb, you shouldn't load your drives up over 80% capacity for this reason. The more full it is the higher the more likely it is to fragment files which causes the drive to tie up during read requests (thus blocking write requests).

There are a number of things you can do to help with these issues:

  • Reduce the amount of data you have stored and regularly defragment your drive.
  • Switch to flash based storage.
  • Keep bulk data stored on a separate drive from your OS.
  • So on and so forth...

Nathanial Meek

Posted 2015-04-19T23:06:35.027

Reputation: 632

Thank you. Ok pretty cool +1. May I quickly ask you why when suing flash memory, one does not have this issue of finding a space to save stuff? – Remi.b – 2015-04-19T23:22:08.107

1

When using flash, the drive can read or write (Again, not both) nearly instantaneously (9ms is a pretty standard seek time on a HDD where as SSDs typically have a "seek time" in the realm of pico and nano seconds) to any location on the disk.

https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics

– Nathanial Meek – 2015-04-19T23:28:59.550

I would like to add one point to this. The speed also increases with the minimum file count. If you have a lot of files with/without empty, it takes some time to iterate for certain functionality (search etc.). – iRavi iVooda – 2015-04-20T03:09:52.357

7

Continuing poing 2: This is what e.g. the $Bitmap file is for on NTFS, or the bitmap allocator in ext4. I.e. this answer is spreading some serious misinformation. 3. There is plenty of read and write buffering and caching going on that renders much of this moot. This answer somewhat describes the effects of fragmentation, and even then limited to older filesystems, it certainly is not accurate wrt free space. Freeing disk space does not speed up a computer.

– Jason C – 2015-04-20T04:46:00.733

4@JasonC, you should turn your comments into an answer. – Celos – 2015-04-20T07:21:47.117

@Celos I just did; I hope you're ready to read a small novel. Also it's worth mentioning that while the first paragraph of this answer misses the boat, the advice bullet points at the end of it are still generally very good advice. – Jason C – 2015-04-20T10:27:19.330

6This does not happen: "so when the drive is held up for an extended time looking for a free location to write to, it can't read any data to you" – it just doesn't. That's not how filesystems work. Please edit your answer to stop spreading misinformation. – RomanSt – 2015-04-20T13:38:34.713

@romkyns You are absolutely right that it is significantly more complex than explained. Caching changes things a ton but I assure you that I am not wrong.

"Generally, the actual SATA signalling is half-duplex, meaning that it can only read or write data at any one time."

https://en.wikipedia.org/wiki/Serial_ATA#Physical_layer

– Nathanial Meek – 2015-04-20T20:24:30.290

2@NathanialMeek You're blending layers a bit. :) SATA's half-duplex mode means it can only transmit data on the wire in one direction. Higher level reads and writes are done in small blocks (called an FIS), in SATA-specified operation queues. They can be asynchronous, and fast reads and writes can be done to and from on-board cache and direct to system memory via DMA. SATA controllers are also free to reorder commands to optimize efficiency. Point being: The line is not held busy while physical operations complete, and SATA's half-duplex mode does not have the effect you think it does. – Jason C – 2015-04-20T20:45:18.510

1SSDs are nowhere near "pico" or even "nano" seconds of seek time. This is utterly ridiculous. The L1 cache on a CPU is double-digit nanoseconds of access, system RAM is triple-digits of nanoseconds. SSDs are in the dozens of microseconds for access, while hard drives are 3 orders of magnitude slower in milliseconds. – Bryan Boettcher – 2015-04-24T18:24:43.143

3

Following the short & sweet approach my oversimplified answer (strictly restricted to your main confusion) is:

As long as your

  1. OS has enough (for worst case scenarios) space to fulfill its duties like paging/swapping/etc.
  2. Other software also have sufficient space for their respective needs.
  3. Hard disk is defragmented.

Then you can't tell difference in performances of a 80% empty disk vs 30% empty disk, and shouldn't worry about anything else but storing of more and more new data.

Anything else which will need more storage will lead to poor performances as now there might be a shortage of available space.

Of course disk cleaning via a tool is good as:

  1. Temporary files should be cleaned regularly to gain valuable disk space.
  2. Old Log files are nothing but waste of space.
  3. Leftovers of the Installed/Uninstalled software are very nasty.
  4. Cookies must be cleared if you value your online privacy.
  5. Invalid shortcuts, etc.

All of these (and many more) reasons lead to the poorer performance as all of these keep on confusing the OS when finding the right set of bits to work with.

Htaank

Posted 2015-04-19T23:06:35.027

Reputation: 31

A decent summary, but not so sure about the "BUT" section. In particular: 3) Generally has no noticeable performance impact, despite common freak-outs 4) Cookies aren't inherently problematic, and regardless of opinion, privacy is not related to performance or hard drive space, 5) Broken shortcuts are ugly but generally inconsequential otherwise. None of this really "confuses" any common OS. Be very careful about the "tips" and "tweaks" you follow. Be wary of unnecessary cleanup tools as well, in particular registry cleaners often risk harm for zero benefit. – Jason C – 2015-04-25T15:08:52.380

2

One effect on spinning drives that I haven't seen mentioned: Access speed and data transfer speed is different on different parts of a disk.

A disk rotates at fixed speed. The tracks at the outside of a disk are longer and therefore can hold more data per track than the tracks at the inside. If your drive can read 100 MB/sec from the outermost tracks, the speed on the innermost tracks will be less than 50 MB/sec.

At the same time, there are fewer tracks between 1 GB of data on the outer tracks of the disk than between 1 GB of data on the innermost tracks. So on average, for data stored on the outside less head movement will be needed than for data on the innermost tracks.

The OS will try to use the outermost tracks, if possible. Of course it isn't possible if the disk is full. Deleting data will make space available where the transfer speed is higher and make things run quicker. For the same reason, you should buy spinning hard drives that are bigger than needed if you want speed (as long as it is affordable), because you will end up only using the fastest portions of the drive.

gnasher729

Posted 2015-04-19T23:06:35.027

Reputation: 277

Adding: http://en.wikipedia.org/wiki/Zone_bit_recording, which hits on this with some detail. Worth noting: Existing data won't be moved around. This may affect storage of new data (depending on location on drive, not directly on free space), but it won't "slow down" existing files that were happily being accessed prior to writes on the inside. Fwiw the cheapest 1TB 7200RPM 3.5" drive I found on Amazon has a user-benchmarked average read rate of 144MB/s; even accounting for differences on inner and outer tracks, this may not be a bottleneck during casual use.

– Jason C – 2015-04-20T16:06:11.023

@JasonC I/O performance in terms of sequential throughput is almost never a concern in practice; even a slow 4900 rpm drive will be plenty fast enough for almost any individual user. I/O performance in terms of read/write operations per second is going to be what kills performance in the majority of cases; ask your local favorite sysadmin about rotational-storage IOPS in multiuser systems some time, if you are so inclined. That's the big reason why practically nobody is deploying rotational storage for multiuser systems these days; you can just never even approach the IOPS of SSDs. – a CVn – 2015-04-23T21:28:48.280