You should first understand what
Allocation Unit Size (AUS)
means.
It is the smallest data block on the disk. Your actual data will be separated into units of that size while saving to the disk. For example, if you have a file sized 512KB and you have 128KB allocation unit size, your file will be saved in 4 units in the disk (512KB/128KB).
If your file's size is 500KB and you have 128KB AUS, your file will still be saved in 4 units on the disk because as mentioned above 128KB is the smallest size of an allocation unit. 384KB will be allocated in 3 units, the remaining 116KB will be allocated in a final unit, and 12KB of that unit will be empty. You can observe this behaviour on the file properties dialog on Windows; what your file size is and how much space this file actually covers on the disk are two different concepts. The operating system reads only the allocation unit size worth of data at a low level disk read operation.
That being said, using a large AUS significantly reduces the free space utilization due to not using the last allocation unit completely. And as a side effect, the number of files to store on the disk is reduced due to same problem: the last AU not being used fully. But here's the trade-off: using a large AUS significantly improves the disk reading performance. The O.S. can read more data at one read. Imagine if the O.S. need to make only a couple of disk reads to completely read a GB sized file!
Using small AUS improves the free space utilization but reduces the disk read performance. Think using large AUS in reverse, same category problems and improvements, but in reverse...
So, what is the conclusion here? If you will store large (I mean large!) files on the disk, a higher AUS will give an appreciable read performance boost while reducing the file count and free space
Which AUS you should use? This depends on how much your average file size is. Also you can compute the free space utilization according to your file sizes.
Very lucid breakdown. But does each cluster have any inherent storage overhead (e.g. indices or the cluster equivalent of sector headers)? And are there any interactions with physical/emulated sector sizes or cache sizes? Lastly, do larger cluster sizes negatively affect random access performance? 4KB sector HDDs seem to have lower random access performance even though they have higher throughput than 512byte HDDs. – Lèse majesté – 2012-04-27T02:40:11.933
2There are no significant storage overhead at high levels. Besides there is enough hrdw overhead since the actual physical sector size is 512Bytes... There is a part of file system formatting that records the cluster information, from how many sector this cluster is created, to the partition structure. The sector size emulation is a job of disk driver. O.S. file system server should deal with logical organization (NTFS, FAT etc) at high level O.S ops, smallest unit reads/writes at low level O.S ops and disk driver itself must work back to back with controller(hardware) for low level hardware... – The_aLiEn – 2012-04-27T03:33:17.277
...access which contains the emulation. And caching is not a job of O.S. It is done by hardware itself. O.S asks for certain data, disk decides wheter look on cache or platter itself for it... Random access performance should actually not be a general performance criteria when having parameters like A.U.S.. Think it this way: ... – The_aLiEn – 2012-04-27T03:33:28.353
.. N sized units, M number units, N*M capacity disk, "what is the probability of hitting this unit?" and remember, disk has to be more precise in locating the beginnings of the units.. So, Random access performance is something bound with M^2/N.. 4K units, 8 units, 32K capacity disk. R.A bound with 64/4. 8K units, 4 units, same capacity, same disk. R.A becomes 16/8. You wouldn't find an article about this kind of calculation, but believe me :) It is more job to "randomly" locate a data using large unit sizes over small sizes – The_aLiEn – 2012-04-27T03:50:30.203