Has there been any research, preferably published in a peer-reviewed journal […]?
One has to go back a lot further than 20 years, of system administration or otherwise, for this. This was a hot topic, at least in the world of personal computer and workstation operating systems, over 30 years ago; the time when the BSD people were developing the Berkeley Fast File System and Microsoft and IBM were developing the High Performance File System.
The literature on both by its creators discusses the ways that these filesystems were organized so that the block allocation policy yielded better performance by trying to make consecutive file blocks contiguous. You can find discussions of this, and of the fact that the amount and location of free space left to allocate blocks affects block placement and thus performance, in the contemporary articles on the subject.
It should be fairly obvious, for example, from the description of the block allocation algorithm of the Berkeley FFS that, if there is no free space in the current and secondary cylinder group and the algorithm thus reaches the fourth level fallback ("apply an exhaustive search to all cylinder groups"), performance of allocating disc blocks will suffer as also will fragmentation of the file (and hence read performance).
It is these and similar analyses (these being far from the only filesystem designs that aimed to improve on the layout policies of the filesystem designs of the time) that the received wisdom of the past 30 years has built upon.
For example: The dictum in the original paper that FFS volumes be kept less than 90% full, lest performance suffer, which was based upon experiments made by the creators, can be found uncritically repeated even in books on Unix filesystems published this century (e.g., Pate2003 p. 216). Few people question this, although Amir H. Majidimehr actually did the century before, saying that xe has in practice not observed a noticeable effect; not least because of the customary Unix mechanism that reserves that final 10% for superuser use, meaning that a 90% full disc is effectively 100% full for non-superusers anyway (Majidimehr1996 p. 68). So did Bill Calkins, who suggests that in practice one can fill up to 99%, with 21st century disc sizes, before observing the performance effects of low free space because even 1% of modern size discs is enough to have lots of unfragmented free space still to play with (Calkins2002 p. 450).
This latter is an example of how received wisdom can become wrong. There are other examples of this. Just as the SCSI and ATA worlds of logical block addressing and zoned bit recording rather threw out of the window all of the careful calculations of rotational latency in the BSD filesystem design, so the physical mechanics of SSDs rather throw out of the window the free space received wisdom that applies to Winchester discs.
With SSDs, the amount of free space on the device as a whole, i.e., across all volumes on the disc and in between them, has an effect both upon performance and upon lifetime. And the very basis for the idea that a file needs to be stored in blocks with contiguous logical block addresses is undercut by the fact that SSDs do not have platters to rotate and heads to seek. The rules change again.
With SSDs, the recommended minimum amount of free space is actually more than the traditional 10% that comes from experiments with Winchester discs and Berkeley FFS 33 years ago. Anand Lal Shimpi gives 25%, for example. This difference is compounded by the fact that this has to be free space across the entire device, whereas the 10% figure is within each single FFS volume, and thus is affected by whether one's partitioning program knows to TRIM all of the space that is not allocated to a valid disc volume by the partition table.
It is also compounded by complexities such as TRIM-aware filesystem drivers that can TRIM free space within disc volumes, and the fact that SSD manufacturers themselves also already allocate varying degrees of reserved space that is not even visible outwith the device (i.e., to the host) for various uses such as garbage collection and wear levelling.
Bibliography
- Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry (1984-08). A Fast File System for UNIX. ACM Transactions on Computer Systems. Volume 2 issue 3. pp.181–197. Archived at cornell.edu.
- Ray Duncan (1989-09). Design goals and implementation of the new High Performance File System. Microsoft Systems Journal. Volume 4 issue 5. pp. 1–13. Archived at wisc.edu.
- Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman (1996-04-30). "The Berkeley Fast Filesystem". The Design and Implementation of the 4.4 BSD Operating System. Addison-Wesley Professional. ISBN 0201549794.
- Dan Bridges (1996-05). Inside the High Performance File System — Part 4: Fragmentation, Diskspace Bitmaps and Code Pages. Significant Bits. Archived at Electronic Developer Magazine for OS/2.
- Keith A. Smith and Margo Seltzer (1996). A Comparison of FFS Disk Allocation Policies. Proceedings of the USENIX Annual Technical Conference. Archived at harvard.edu.
- Steve D. Pate (2003). "Performance analysis of the FFS". UNIX Filesystems: Evolution, Design, and Implementation. John Wiley amp; Sons. ISBN 9780471456759.
- Amir H. Majidimehr (1996). Optimizing UNIX for Performance. Prentice Hall. ISBN 9780131115514.
- Bill Calkins (2002). "Managing File Systems". Inside Solaris 9. Que Publishing. ISBN 9780735711013.
- Anand Lal Shimpi (2012-10-04). Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs. AnandTech.
- Henry Cook, Jonathan Ellithorpe, Laura Keys, and Andrew Waterman (2010). IotaFS: Exploring File System Optimizations for SSDs. IEEE Transactions on Consumer Electronics. Archived at stanford.edu.
- https://superuser.com/a/1081730/38062
- Accela Zhao (2017-04-10). A Summary on SSD & FTL. github.io.
- Does Windows trim unpartitioned (unformatted) space on an SSD?
1Free space doesn’t have a performance hit if your using mechanical drives and how honestly the firmware on SSDs today are so efficient it isn’t true on them anymore – Ramhound – 2017-10-04T11:50:11.433
what I don't get is the SWAP argument. Why would I leave a percentage of the HDD's capacity, and not just 100% of the RAM's capacity? – Abdul – 2017-10-04T11:55:30.567
2I don't know about research findings, but I know about my own findings. There will be no performance penalty (apart from marginally slower directory accesses) on a nearly full drive if all its files are defragmented. The problem is that many defragmenters optimise file fragmentation, but in the process leave the free space even more fragmented, so that new files become immediately fragmented. Free space fragmentation becomes much worse as discs become full. – AFH – 2017-10-04T12:00:57.480
8@Abdul - A lot of the advice on swap file size is misleading. The key requirement is to have enough memory (real and virtual) for all the programs you want to have active at a time, so the less RAM you have, the more swap you need. Thus taking a proportion of the RAM size (double is often suggested) is wrong, other than as an arbitrary initial size until you find out how much you really need. Find out how much memory is used when your system is busiest, then double it and subtract the RAM size: you never want to run out of swap space. – AFH – 2017-10-04T12:12:37.410
Could you please add what filesystem/OS do you ask about. – enkryptor – 2017-10-04T13:03:57.293
1on swap, it should be at least as large as your RAM if you are going to hibernate the machine. This is because swap is where your memory contents are saved. Ideally, you want it bigger, so you can hibernate while using swap, if needed. – Baldrickk – 2017-10-04T14:48:24.870
1@Ramhound I doubt that. The less space left on the disk, the higher chance new files will be fragmented around and reduce performance – phuclv – 2017-10-04T16:31:59.397
10% HDD, 25% SSD. – PCARR – 2017-10-04T16:33:18.213
empties his trash can – Housemd – 2017-10-04T16:43:04.450
https://www.washingtonpost.com/news/morning-mix/wp/2015/03/27/fabricated-peer-reviews-prompt-scientific-journal-to-retract-43-papers-systematic-scheme-may-affect-other-journals/?utm_term=.08a15bab3f1b and friends. I simply don't believe them. – Eugen Rieck – 2017-10-04T16:52:27.790
2I think this really depends on what you are using the drive for. If you need to add and remove tons of data from the hard drive in large segments, I would leave decent amount of free space based off of the size of the files you need to move around. 10-20% seems a reasonable general suggestion, but I have nothing to support that apart from personal experience. – David – 2017-10-04T18:47:27.357
@LưuVĩnhPhúc - I don't agree there is a huge hit on performance from file fragmentation on SSDs – Ramhound – 2017-10-04T18:59:07.200
2
@EugenRieck, see Jon Turney's "End of the peer show" (New Scientist, 22 Sept 1990). Peer review is known to be imperfect, but there are few better options. Even a mediocre & erroneous paper should be more falsifiable than a vague, passing claim in a blog or forum post, making it a better starting point for understanding.
– sampablokuper – 2017-10-04T19:13:27.520@sampablokuper Please don't get me wrong - I don't want to declare a superiority of Stack Overflow over peer-review Money makers. If you want to write a paper for one of these journals, go with sources from these journals. But if you want to keep production systems alive and well, go with Stack Overflow. These two worlds share no overlap. – Eugen Rieck – 2017-10-04T19:18:23.023
@Ramhound I mean on HDD as you were saying "Free space doesn’t have a performance hit if your using mechanical drives". Anyway fragmentation will introduce a very small on SSD because instead of reading the whole giant extent on ext4 and NTFS, the driver now has to determine which block to read next – phuclv – 2017-10-05T02:32:26.960
1Important thing: if it is the drive where software constantly performs operations, e.g. C: with OS installed - you should care for a free space. If it is a storage drive - e.g. a drive with photos and videos - safely fill it up 100%. – Dima – 2017-10-05T10:49:28.830
2
@EugenRieck: "peer-review Money makers"; some publishers are more ethical than others. (In case you are wondering, yes I know the tragedy of U.S. v. Aaron Swartz.) "These two worlds share no overlap." Happily, they do. At universities & elsewhere, I see sysadmins & academics alike benefit from both SE & PR. Please let's stay on-topic now on, thanks :)
– sampablokuper – 2017-10-05T17:52:25.920It's important to draw a distinction between the performance impact of a data storage disk being full and the impact of an operating system boot volume being full, especially for Windows OSes. The answers are completely different for the two scenarios. – barbecue – 2017-10-05T23:58:39.637