Are files that use NTFS compression decompressed onto disk or into memory?

14

How does NTFS decompression work in Windows? According to Microsoft, NTFS decompression is done by expanding the file, then using it. That sounds right, but my question is how this process occurs technically.

Does Windows load the compressed file into memory, expand it in memory, and read from memory? Or does it load the compressed file into memory, expand it to the disk or in memory, write to the disk, and then read?

I'm trying to figure out if perhaps I can improve my computer's performance by using NTFS compression. That way, the slow disk drive or SSD that can't handle that many write operations will always have less data to write and read, and my powerful processor that is idling most of the time can decompress the files, improving my storage speed and health.

CausingUnderflowsEverywhere

Posted 2016-07-04T22:13:52.180

Reputation: 283

1

I edited your question to focus more on whether files are decompressed to memory or disk. That way, it will be much less likely to be closed as a duplicate of this other question, which touches more on the other aspects.

– Ben N – 2016-07-05T00:08:41.077

Answers

19

Windows decompresses files into memory. Doing it onto disk would completely obliterate any speed improvements and would cause a lot of unnecessary disk writing. See the end of this Microsoft blog article on NTFS sparse files and compression:

  1. NTFS determines which compression unit is being accessed.
  2. The compression unit’s entire allocated range is read.
  3. If the unit is not compressed, then we skip to step 5. Otherwise, NTFS would attempt to reserve (but not allocate) the space required to write the decompressed CU back to disk. If insufficient free space exists on the disk, then the application might get an ERROR_DISK_FULL during the read.
  4. The CU would be decompressed in memory.
  5. The decompressed byte range would be mapped into cache and returned to the requesting application.
  6. ...

Of course, if you're low on memory, the memory used by the decompression process could cause other memory be paged out and written to disk in the page file. Fortunately, only the chunks containing sections that your programs actually read will be decompressed; NTFS doesn't have to decompress the whole thing if you only need a few bytes.

If your SSD is fast, you're probably not going to get speed improvements from NTFS compression. It's conceivable that the time your processor spends decompressing data plus the time your disk spends reading the compressed data could add to be more than the time your SSD takes to read the uncompressed data. It also depends on the size of the files you work with. The minimum size of a compressible file ranges from 8 KB to 64 KB, depending on your cluster size. Any files less than that size won't be compressed at all, but a tiny amount of bookkeeping would be added.

If you do a lot of writing to compressed files, you could see a lot of variance in speed due to the compression algorithm used (LZ).

Further reading: How does NTFS compression affect performance?

Ben N

Posted 2016-07-04T22:13:52.180

Reputation: 32 973

1ockquote>

if you're low on memory, the decompressed data could be paged out and written to disk in the page file [citation needed] -- a smart algorithm would simply throw out the decompressed data and perform the decompression again on next access, with the assumption of (de)compression being orders of magnitude faster than paging. In fact, that's already what happens with the page cache - and I'd expect that Windows would simply put this decompressed data into that same cache. (In Windows, all file r/w goes through the page cache, even when it's write-through.)

– Bob – 2016-07-05T00:18:32.127

Indeed, that's probably what it does. I've adjusted that part of the answer, thanks. – Ben N – 2016-07-05T00:21:19.350

>"The decompressed byte range would be mapped into cache" Do you know what the definition of cache is here? Just curious.

"Otherwise, NTFS would attempt to reserve the space required to write the decompressed CU back to disk." Do we know the exact reason for this? Is Microsoft assuming here that the modification to the file won't add size that will cause the total compressed size to surpass the original un-compressed size? Seems like a sucky assumption. – CausingUnderflowsEverywhere – 2016-07-16T05:19:15.090

So in summary we're looking at:

Read from disk, -> read MFT to check for enough space to write decompressed, -> decompress in memory, -> throw it into the requesting application's cache? are we talking about the application's private bytes? just curious. ---------- Is this what we're looking at here? – CausingUnderflowsEverywhere – 2016-07-16T05:22:52.607

1

@CausingUnderflowsEverywhere That cache is the IO cache, which makes it so multiple reads won't all need to be serviced by the disk. NTFS does hope that the new data will fit in the existing CU's, but it does make sure that there's space if it doesn't. It is my understanding that the IO cache is not specific to one application, though the data will end up in the program's private memory when it's called for.

– Ben N – 2016-07-17T02:42:31.840