8

Suppose that we produce a sensitive document on a Windows 7 box, NTFS filesystem. As we write the document, it grows longer, and we keep saving it, which means that the editor overwrites it from the beginning, truncating it to zero length and re-creating new content.

Assuming that the editor itself re-uses the same filesystem object itself, are we assured that the filesystem will use the same physical blocks on disk for all the parts of the file that already exist? Or can it allocate new blocks right away, since the tail end of the file is truncated off?

Or does it depend to how the file is written: overwrite followed by explicit truncate, versus opening for overwrite?

The relevance to security is that if the previous blocks are liberated and potentially different blocks are allocated to the file, then it is not enough to shred the file to destroy it; we must wipe all free space after having done multiple saves. To avoid doing that, we must produce the file in a single pass, or else shred the on-disk copy outside of the editor prior to each save.

Kaz
  • 2,303
  • 16
  • 17

4 Answers4

5

In general, block allocation is the most expensive operation in filesystems, so filesystems will try quite hard to avoid it, in particular by reusing blocks when possible. This would mean the following:

  • When overwriting an existing file, the same blocks are reused. New blocks are allocated when the new file data exceeds that of the overwritten files.

  • When truncating an existing file, all the blocks are released, thus potentially reusable for other file operations. In that case, the new file may allocate new blocks. There is no guarantee that the new file contents will reallocate the same blocks, and, in particular, the old blocks might have been reallocated to other files in the mean time.

However, it depends a lot on the filesystem internals. Log-structured filesystems perform all writes sequentially, throughout the whole partition, so it is pretty much guaranteed, with such a filesystem, that the new file will not overwrite the blocks from the old file. Journaling filesystems may copy the file contents to an extra structure (the "journal") in addition to the actual permanent storage (depending on whether the journaling extends to the file contents, or just the metadata). Some filesystems also use a "phase tree" which can be viewed as a log-structured filesystem, with a tree instead of a list; for these, overwrites may or may not happen.

An important point to consider is that block allocation strategies do not depend only on the filesystem, but also on the implementation. There is no guarantee that Windows XP and Windows 7, for instance, behave similarly on the same NTFS filesystem. One OS version may find it worthwhile to keep around old blocks to "speed up (re)allocation" while another could employ another strategy. This is all heuristics, tuned and retuned. Thus, one cannot really answer your question about "NTFS"; one would have to talk about "NTFS as implemented in OS foobar, version 42.17, build 3891".


Moreover, all these blocks are what the OS sees; actually physical storage may differ, and move/copy data around. This is typical of wear-levelling algorithms in SSD. Generally speaking, overwriting/shredding files on SSD is not reliable (see this answer for details and pointers). But some data movement can also happen with magnetic disks (in particular when a flaky sector is detected; remapping is done on the fly, and the old sector remains untouched, forever).

This basically means that file shredding does not work well in that it cannot guarantee that the data will be destroyed. You should use file shredding only as an emergency measure when other methods have failed or were erroneously not applied. The correct ways to permanently destroy a file are:

  • Wholesale destruction of the complete disk, e.g. by dissolving it in acid.
  • Encryption: when the data is encrypted, destroying the key is enough to make the data unrecoverable. While this does not completely solves the issue (you still have to destroy a data element), it makes it much easier (a key is small: it is much easier to destroy 128 bits than to destroy 128 gigabytes).

Secure erasing, when implemented properly by the disk, works with the encryption trick.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
1

Firstly if the drive is an SSD then no, because beyond anything that the OS is doing the drive will perform wear leveling, this means that the data will likely be written to different drive location even if the OS is writing to the same blocks.

In Windows, the file descriptor includes the filename, unlike most Linux systems that separate the physical data allocation 9inone) from the directory entry (filename). So when you start rewriting an existing file the OS will unlink all subsequent blocks from the first one, so the first block remains the same, but subsequent blocks will be reallocated a,d could be allocated differently.

Tools are available that do secure erasure of files, these perform destructive writes to the file without truncating it, and then rename the filename to overwrite the directory entry.

CodesInChaos
  • 11,854
  • 2
  • 40
  • 50
Stuart
  • 300
  • 1
  • 3
1

You can't presume that the same blocks will be used. And as said in other answers, with SSD and wear leveling, it's beyond your control. For a sensitive document I'd propose an encrypted container like TrueCrypt. Don't forget to use an encrypted swap file, too.

ott--
  • 128
  • 1
  • 1
  • 9
0

Is this behavior a characteristic of the document processing software you are using for sure? It has been a while since I've looked at the OOXML Standard, if you grab the document it is quite ... healthy. I am making an assumption that you are using Microsoft Office, forgive me if I have taken too large of a leap. Coupled with the fact that the Zip format that OOXML uses as a container makes facilities available in the specification to stream portions of the file from the container, I can't think of a reason that a document created this way would be an immutable data-structure.

If it isn't Office, and the application does function like this, I'm not certain that it wouldn't get optimized away in the compiler or kernel unless the software engineer were explicit to disallow this functionality.

But if you wanted to do some checking, you can always use fsutil to lookup the file IDs, a similar concept would be the Posix INodes or VNodes, which if you have cygwin installed you could get their representation with ls -i.

I need to brush up on the internals of the various SSD architectures, but I'm wondering if this question stems from TRIM and how perhaps (at least at one point) you may not have been assured that overwriting a file with a random function would have actually overwritten the file, instead having a wear-leveling algorithm applied to smooth out the IOPs on the drive.

I hope someone can shed some light on this, but perhaps if not just for the mental gymnastics of it, you could be focusing on the wrong problem. If you need some reasonable assurances of your data at rest and this document is sensitive enough that you are worried about leakage; maybe the document or drive should be encrypted.

M15K
  • 1,182
  • 6
  • 7