When we save a file in windows - is actual location of file on hard disk random or deterministic?

2

I want to know when we try to save a file to harddisk how the os saves the file to hard disk. Will two computers with same configuration and ALSO SAME INTERNAL STATE save the file on the same location on their hard disk or will their addresses will be random?

Ratna

Posted 2013-06-22T10:48:56.953

Reputation: 225

Answers

4

It's mostly deterministic – filesystems use various algorithms to determine the best place for new data. But it is not possible to 100% duplicate all internal state, so you have to consider that:

  • different filesystems (ext4, btrfs, NTFS...) use different allocation algorithms,

  • which can also be influenced by the program doing the writing (e.g. a file that grows to 100 MB slowly will sometimes be allocated differently from a file that's created by fallocate()'ing 100 MB at once),

  • as well as other programs writing to disk at the same time, since the allocation of file B will depend on whether file A was already written or not (all determinism here goes away when you have a multi-core or multi-CPU system);

  • size and location of existing files;

  • size and location of deleted files (e.g. on log-structured filesystems, the data only goes forward)

  • different disk types (filesystems may care much less about fragmentation when writing to solid-state disks than to magnetic disks);

  • physical corruption (if one sector gets corrupted, the filesystem might choose to put the entire file elsewhere instead of just skipping that one sector);

And finally, even if both example computers have 1:1 copies of raw disk contents,

  • some filesystems may make random choices if that's written into the algorithm. From a quick grep, it seems that at least Ext4 uses random choice as a fallback when all choices are equal.

user1686

Posted 2013-06-22T10:48:56.953

Reputation: 283 655

2

It depends on filesystem, implementation and external factors.

  • Internal state includes absolutely everything about the computer, i.e. processor state, data in RAM, current data on disk and how it's laid out, EVERYTHING inside the computer. But there are also external factors like for example disk faults that don't depend on the computer's state - you have to take those into consideration.

  • "Randomized layout" would probably be deterministic too. Computers are deterministic. "Random" numbers used in computer science are in most cases pseudorandom (and usually that's totally fine, with very few exceptions). So even if filesystem imposes some randomness, it's very likely that it will still be deterministic.

gronostaj

Posted 2013-06-22T10:48:56.953

Reputation: 33 047

2Modern OSes seed the in-kernel PRNG from things like CPU frequency variations, disk timings or keyboard interrupts, which aren't completely deterministic. Even if you have two identical systems, their RNGs could still output different values because the program started several cycles later. – user1686 – 2013-06-22T17:07:41.263