18

We have a hard disk that is 600 Gigs and nearly full. It's been filled up with 18,501,765 files (mostly small 19k images) and 7,142,132 folders. It's very difficult to find out where exactly all the space has gone too. Our regular cleanup procedures are not clearing up enough space which means we need to look at this drive as a whole and determine what is out there and what can be moved or removed. We've tried several applications and so far they have either blown up or simply ran for an amazing amount of time to complete.

Server Information

  • Operating System: Windows Server 2003
  • File System: NTFS

Solution

Space ObServer was able to read through 18,501,765 files and 7,142,132 folders without taking up hardly any memory. I'm sure this is mostly due to the fact that it uses a SQL backend to store all of the data. It unfortunately the most expensive of all the products at $259.95 per server.

Attempted Solutions

During my research I tried several different solutions both pay and free. I kept a list of the products I tried below for everyone's information.

Free Software

Pay Software

Updates

Update #1: The server I am attempting to analyze has 2 GB of RAM and most products that I try seem to try and keep the file/folder information in memory. This tends to run out much too quickly with 18,501,765 files and 7,142,132 folders.

Update #2: Looks like the developers of WinDirStat got involved enough to tell us that it can compile under 64-bit. That gives it more memory to work with but I'm not sure if it's going to be enough unless they can persist to disk.

Cristian Ciupitu
  • 6,226
  • 2
  • 41
  • 55
Nathan Palmer
  • 205
  • 3
  • 7

13 Answers13

6

Assuming your OS is Windows...

Either way you slice it, tabulating millions of files is always going to take a long time and will be restricted by the I/O of the disk itself. I recommend TreeSize Professional. Or maybe SpaceObServer. You could give the freeware version of TreeSize a try as well.

Wesley
  • 32,320
  • 9
  • 80
  • 116
5

Definitely try WinDirStat: it gives a fantastic visualization of disk use by depicting each file as a rectangle drawn to scale, color coded by file type. Click on any item in the visualization and you'll see it in the directory tree.

The standard 32-bit build is limited to 10 million files and 2 GB RAM usage, but the source code will build successfully as a 64-bit application. The fact that the server in question has only 2GB of RAM may be problematic in this specific case, but most servers with such large numbers of files will have much more RAM.

Edit #1: I regret to have discovered that, when tested on a 4TB volume containing millions of files, WinDirStat Portable crashed after indexing about 6.5 million files. It may not work for the original question if the drive contains 6+ million files.

Edit #2: Full version of WinDirStat crashes at 10 million files and 1.9GB used

Edit #3: I got in touch with the WinDirStat developers and: (1) they agree that this was caused by memory usage limitations of the x86 architecture, and (2) mentioned that it can be compiled as 64-bit without errors. More soon.

Edit #4: The test of a 64-bit build of WinDirStat was successful. In 44 minutes, it indexed 11.4 million files and consumed 2.7 GB of RAM.

Skyhawk
  • 14,149
  • 3
  • 52
  • 95
  • It might be worth trying the regular version, as it's possible that the portable environment created an unexpected restriction. I'm not in a position to test that myself. http://windirstat.info/ – John Gardeniers Jul 29 '10 at 00:22
  • Indeed, the regular version dies at 10+ million files and 1.9GB RAM usage. I suspect that it is unable to allocate >2GB. I'm surprised that it uses quite so much RAM (nearly 200 bytes per file tallied), but, then again, I grew up in an era when individual bytes were far more precious than they are today... – Skyhawk Jul 29 '10 at 00:48
  • I use WinDirStat a lot. Unfortunately it just doesn't cut it when you get into a large # of files. – Nathan Palmer Jul 29 '10 at 05:22
  • I'd be interested to hear if the dev's on WinDirStat come back with anything. RAM is going to be a constraint for me in 32-bit or 64-bit. – Nathan Palmer Jul 31 '10 at 16:55
  • Indeed so: with only 2GB of RAM, a 64-bit build would have to use an awful lot of swap to get the job done. I'll let you know how it goes on a beefier server, anyway. – Skyhawk Aug 01 '10 at 08:19
  • Is there a download of the compiled 64-bit version? – Nathan Palmer Aug 03 '10 at 05:33
  • 1
    There is no official build, but I can send you an unofficial one -- obviously, it would be tricky to roll your own if you don't have Visual Studio! (my.name@gmail.com reaches me) – Skyhawk Aug 03 '10 at 05:48
  • There's now a 64-bit alpha [on bitbucket](https://bitbucket.org/windirstat/windirstat/downloads/). Apparently someone is working on WinDirStat again. – Fake Name Dec 08 '17 at 05:47
4

I regularly use FolderSizes on several 1TB drives with several million files with no problems.

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
3

+1 for the TreeSize products, but...

Your sentence about "not cleaning enough space" makes me wonder: Could you have run out of NTFS MFT reserved space? If the filesystem grabs more MFT space than is initially allocated, it is not returned to regular filespace, and is not shown in defrag operations.

http://support.microsoft.com/kb/174619

"Volumes with a small number of relatively large files exhaust the unreserved space first, while volumes with a large number of relatively small files exhaust the MFT zone space first. In either case, fragmentation of the MFT starts to take place when one region or the other becomes full. If the unreserved space becomes full, space for user files and directories starts to be allocated from the MFT zone competing with the MFT for allocation. If the MFT zone becomes full, space for new MFT entries is allocated from the remainder of the disk, again competing with other files. "

AndyN
  • 1,739
  • 12
  • 14
  • That looks like something good to check. Unfortunately we cannot see the MFT size because defrag will not analyze without a CHKDSK and CHKDSK is currently failing with "An unspecified error occurred." – Nathan Palmer Jul 29 '10 at 19:06
3
  1. cd \
  2. dir /s > out.txt
  3. poof! Magic happens; or a perl hacker shows up
  4. Results!

Seriously. I've done this with 5 or 6 million files; not sure exactly what you're looking for but a good scripting language will eat this up.

SqlACID
  • 2,166
  • 18
  • 18
  • Soo... what happens in step #3.. assuming a perl hacker does not show up? – Nathan Palmer Aug 03 '10 at 16:17
  • Can you post more info on what you need? Largest files? Largest directories? Do you need date/time info? Is it a one-time need, or recurring? – SqlACID Aug 03 '10 at 16:44
  • For now it's one time. I need to know which directories are the largest (dir + children) but I will need to go a few directories in before that information in valid. Then I will need to have a break-up of files by date so I can view recent vs old files. – Nathan Palmer Aug 04 '10 at 01:47
3

I'm not usually a Windows user, but I'm aware of Cygwin's existence. :-)

If it works good enough, something like

du -m /your/path | sort -nr | head -n 50

or perhaps in Cygwin

du C:\ | sort -nr | head -n 50

Anyway, those should print you out 50 biggest directories (sizes in megabytes).

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
2

I found a few issues with Spacemonger and in looking for a utility i could easily transfer or run from usb stick - Space Sniffer turned out to be very versatile in that regards, and handled multi-terabyte volumes with ease.

  • multi-terabyte volumes with how many files? It seems our major issue is not how much space is used but how many files the program can handle. Most are choking at 10 million. – Nathan Palmer Jul 29 '10 at 15:24
  • I don't have a server with more than a few million files on to experiement with so I couldnt confidently answer your 10million file question - my only suggestion would be that with these tools you can set the directory depth at which is visualizes - find the happy medium and then go deep in the folder you need to, should save time doing the visualization as well. –  Jul 30 '10 at 00:33
1

du -s can be used in a pinch, and will run as long as needed.

Ignacio Vazquez-Abrams
  • 45,019
  • 5
  • 78
  • 84
1

On Windows I use SpaceMonger (or older free version). On OSX I use Disk Inventory X.

ggutenberg
  • 153
  • 1
  • 5
1

Have a look at GetFoldersize

user9517
  • 114,104
  • 20
  • 206
  • 289
0

Conerning the mft table, from the back of my head I seem to recollect that jkdefrag, the original opensource version, gave a very precise disk view including differet colouring for mft areas. I think i used that once before for a rule of thumb guesstimate of mft size and fragmentation.

Also doesn't care for chdsk.

Might try that?

deploymonkey
  • 588
  • 3
  • 11
  • I gave it a try. When I run the analyze it suffers the same fate most of these other programs have. Too many files/folders stored in memory. – Nathan Palmer Aug 01 '10 at 00:21
  • Sorry to read that. Seems like a platform problem. I have another suggestion: Mirror the disk (bit image, Imaging soft or hardware mirror) no matter how, break the mirror and put the copy under forensics on another platform eg. linux/nix. This has consumed enought time to warrant for the cost of a mirror drive concerning the amount of Your working time invested. – deploymonkey Aug 01 '10 at 18:44
0

http://www.freshney.org/xinorbis/

Another potential option

0

I've used Disk Usage Analyzer - Baobab on Linux using it's remote scan function on Windows server. I don't know what are it's limits though.

Fedora LiveCD contains Baobab. Boot it on any computer in your LAN.

It's all free — as in beer and as in speech. Also for commercial use.

Tometzky
  • 2,649
  • 4
  • 26
  • 32