How can I check the actual size used in an NTFS directory with many hardlinks?

14

5

On a Win7 NTFS volume, I'm using cwrsync which supports --link-dest correctly to create "snapshot" type backups. So I have:

z:\backups\2010-11-28\cygdrive\c\Users\...
z:\backups\2010-12-02\cygdrive\c\Users\...

The content of 2010-12-02 is mostly hardlinks back to files in the 2010-11-28 directory, but there are a few new or changed files only in 2010-12-02. On linux, the 'du' utility will tell me the actual size taken by each incremental snapshot. On Windows, explorer and du under cygwin are both fooled by hardlinks and shows 2010-12-02 taking up a little more space than 2010-11-28.

Is there a Windows utility that will show the correct space acutally used?

kbyrd

Posted 2010-12-02T22:16:44.767

Reputation: 2 067

Tools addressing this would be very helpful in getting an accurate picture of Why does the /winsxs folder grow so large, and can it be made smaller? and

– matt wilkie – 2012-11-02T18:22:58.180

this seems to be the de-factor question & answers for normal disk usage: How can I visualize the file system usage on Windows?

– matt wilkie – 2012-11-09T06:23:59.717

Answers

11

Try using Sysinternals Disk Usage (otherwise know as du), specifically using the -u and -v flags will only count unique occurrences, and will show the usage of each folder as it goes along.

As far as I know the file system doesn't show the difference between the original file and a hard link (that is really the point of a hard link) so you can't discount them on a folder-by-folder basis, but need to do this comparatively.

To test I created a random folder with 6 files in to. Cloned the whole thing. Then created several hard and soft links inside the first folder to reference other files in the first folder, and also some in the second.

Running du -u -v testFld results in (note the values next to the folders are in KiB):

       104  <path>\testFld\A
        54  <path>\testFld\B
       149  <path>\testFld

Totals:
Files:        12
Directories:  2
Size:         162,794 bytes
Size on disk: 162,794 bytes

Running du -u -v testFld\a results in:

104  <path>\testFld\a
...

Running du -u -v testFld\b results in:

74   <path>\testFld\b
...

Notice the mismatch?
The symlinks in A that refer to files in B are only counted against A during the "full" run, and B only returns 54 (even though the files were originally in B and hard-linked from A). When you measure B seperately (or, if you don't use the -u unique flag) it will count its "full" measure of 74.

DMA57361

Posted 2010-12-02T22:16:44.767

Reputation: 17 581

This answer confuses the function of the -u flag. You get the "full" measure if you use the -u flag. Without it, it only counts 1 instance of any hard-linked file. Says so in the docs: https://docs.microsoft.com/en-gb/sysinternals/downloads/du and testing verifies it.

– martixy – 2019-07-27T19:09:59.883

1Thanks, I didn't know about the sysinternals du, just the cygwin one. Apparently the cygwin du does what I want as well, I just didn't think to try it before starting the bounty. – kbyrd – 2010-12-13T15:28:32.437

2

PowerShell 5 may be an option. It is available for Windows 7 but I only tested this on a Server 2012 R2 with the April 2015 Preview

The filesystem provider in PowerShell 5 has two new properties LinkType and Target:

ls taskmgr.exe | fl LinkType,Target

this returns:

LinkType : HardLink
Target   : C:\Windows\WinSxS\amd64_microsoft-windows-advancedtaskmanager_..._6.3.9600.17..2\Taskmgr.exe

So now I can only show all files in system32 that are not hardlinks:

cd $env:SystemRoot\System32
ls -Recurse -File -force -ErrorAction SilentlyContinue | ? LinkType -ne HardLink | Measure-Object -Property Length -Sum

this returns:

Count    : 844
Sum      : 502,486,831

you can compare that with all files:

ls -Recurse -File -force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum

Count    : 14092
Sum      : 2,538,256,262

So over 13,000 files with 2GB+ are hardlinks

Peter Hahndorf

Posted 2010-12-02T22:16:44.767

Reputation: 10 677

1

TreeSize Professional (~$55, 30 day trial) claims to distingish NTFS hardlink disk space. A quick trial seems to bear this out.

Hardlink support is not turned on out of the box: go to Tools > Options > Scan, re-scan, then use Ctrl-1 and Ctrl-2 to switch between Size and Allocated space. Allocated is actual space used, while Size is the statistic normally reported by other programs.

There is a performance penalty for turning on hardlink support (and symlinks and mounts too if you want that also). The colour palette is garish for my taste, but that seems to be par for the course in this genre. Also be careful when clicking around in the box chart area -- it's easy to accidentally move a folder with a mistaken drag-n-drop when you only meant to expand it.

matt wilkie

Posted 2010-12-02T22:16:44.767

Reputation: 4 147

1

I think some facts need to be set right here.

Windows cannot "detect" hardlinks, since every file is actually a hardlink to a bunch of bytes on the disk.

The du tool detects duplicates, but that is false too, since if folder A contains files and B only contains hardlinks to the files in A, then du of A and du of B will return the same answer - the size of the files coming originally from A, but these files are now also in B.

This is actually correct, since for example if you deleted A then its files will not be deleted on the disk, because they are still referenced by B. With hard-links, which file is the source and which one is the hard-link is quite arbitrary and meaningless.

Products such as du will list a directory while discounting duplicates. This will only work if all files and hard-links are contained in one directory. Many folder-list products do that.

Conclusion: With hard-links, the question of "the actual size used in an NTFS directory" is meaningless.

harrymc

Posted 2010-12-02T22:16:44.767

Reputation: 306 093

1

I also do some research about this question. Here is the results I discovered.

The folder size containing hardlinked files in NTFS may be considered in three different meanings:

  1. Size including sizes of all hardlinked files (which is shown by WE).
  2. Size of unique files only in terms of the current folder.
  3. Size of unique files only in terms of the whole disk.

The number 2 is what is shown by TreeSize Professional, in Details tab, Allocated column, if option "Track NTFS hardlinks" is enabled.

Here is exaple for winsxs folder (7.5Gb in opposition for 10):

image

Receiving number 3 value is still a question for me. Although I was able to get a lower bound by using Total Commander with NL_Info plugin. What I have got is a size occupied by files which have onle one hardlink (unique files). It was about 5Gb for a given example.

So trying to expand harrymc answer or say in other words.

tschesseket

Posted 2010-12-02T22:16:44.767

Reputation: 111

0

You can use ln.exe to show the "true size" of a directory tree:

ln.exe --truesize z:\backups\.

It will only detect hardlinks below that starting folder.

Limer

Posted 2010-12-02T22:16:44.767

Reputation: 63