Technical Explanation
The reason that most methods are causing problems is that Windows tries to enumerate the files and folders. This isn’t much of a problem with a few hundred—or even thousand—files/folders a few levels deep, but when you have trillions of files in millions of folders going dozens of levels deep, then that will definitely bog the system down.
Let’s you have “only” 100,000,000 files, and Windows uses a simple structure like this to store each file along with its path (that way you avoid storing each directory separately, thus saving a some overhead):
struct FILELIST { // Total size is 264 to 528 bytes:
TCHAR name[MAX_PATH]; // MAX_PATH=260; TCHAR=1 or 2 bytes
FILELIST* nextfile; // Pointers are 4 bytes for 32-bit and 8 for 64-bit
}
Depending on whether it uses 8-bit characters or Unicode characters (it uses Unicode) and whether your system is 32-bit or 64-bit, then it will need between 25GB and 49GB of memory to store the list (and this is a a very simplified structure).
The reason why Windows tries to enumerate the files and folders before deleting them varies depending on the method you are using to delete them, but both Explorer and the command-interpreter do it (you can see a delay when you initiate the command). You can also see the disk activity (HDD LED) flash as it reads the directory tree from the drive.
Solution
Your best bet to deal with this sort of situation is to use a delete tool that deletes the files and folders individually, one at a time. I don’t know if there are any ready-made tools to do it, but it should be possible to accomplish with a simple batch-file.
@echo off
if not [%1]==[] cd /d %1
del /q *
for /d %%i in (*) do call %0 "%%i"
What this does is to check if an argument was passed. If so, then it changes to the directory specified (you can run it without an argument to start in the current directory or specify a directory—even on a different drive to have it start there).
Next, it deletes all files in the current directory. In this mode, it should not enumerate anything and simply delete the files without sucking up much, if any, memory.
Then it enumerates the folders in the current directory and calls itself, passing each folder to it(self) to recurse downward.
Analysis
The reason that this should work is because it does not enumerate every single file and folder in the entire tree. It does not enumerate any files at all, and only enumerates the folders in the current directory (plus the remaining ones in the parent directories). Assuming there are only a few hundred sub-directories in any given folder, then this should not be too bad, and certainly requires much less memory than other methods that enumerate the entire tree.
You may wonder about using the /r
switch instead of using (manual) recursion. That would not work because while the /r
switch does recursion, it pre-enumerates the entire directory tree which is exactly what we want to avoid; we want to delete as we go without keeping track.
Comparison
Lets compare this method to the full-enumeration method(s).
You had said that you had “millions of directories”; let’s say 100 million. If the tree is approximately balanced, and assuming an average of about 100 sub-directories per folder, then the deepest nested directory would be about four levels down—actually, there would be 101,010,100 sub-folders in the whole tree. (Amusing how 100M can break down to just 100 and 4.)
Since we are not enumerating files, we only need to keep track of at most 100 directory names per level, for a maximum of 4 × 100 = 400
directories at any given time.
Therefore the memory requirement should be ~206.25KB, well within the limits of any modern (or otherwise) system.
Test
Unfortunately(?) I don’t have a system with trillions of files in millions of folders, so I am not able to test it (I believe at last count, I had about ~800K files), so someone else will have to try it.
Caveat
Of course memory isn’t the only limitation. The drive will be a big bottleneck too because for every file and folder you delete, the system has to mark it as free. Thankfully, many of these disk operations will be bundled together (cached) and written out in chunks instead of individually (at least for hard-drives, not for removable media), but it will still cause quite a bit of thrashing as the system reads and writes the data.
So your main problem is the fact that directories and subdirectories aren't being deleted? – Sandeep Bansal – 2012-04-24T10:01:13.943
@Jackey Cheung: which version of windows you are using? – Siva Charan – 2012-04-24T10:10:00.580
The version I'm using is Windows 7 64-bits. The files/directories that are processed got deleted. The problem is that it can't process so many files in a run, and eventually stuck/crashed. – Jackey Cheung – 2012-04-24T10:27:44.460
1You could write a batch script that recoursively deletes files, not starting from the top level but on e.g. the fifth level of the folder structure. That would split the job into many separate and sequential 'rm's – None – 2012-04-24T11:27:09.847
Yeah, writing a batch file could do trick. Actually I was considering write a program to specifically do this. But this is off the topic. – Jackey Cheung – 2012-04-25T00:59:55.910
9I have to know, how the hell did you get a trillion files, really... – Moab – 2012-04-25T01:35:42.060
I'm guessing Virus? Otherwise I don't know how it's possible to get that many files in various subdirectories – Mark Kramer – 2012-04-25T04:09:55.563
It's normal to have such number of files on our server due to the nature of its work. – Jackey Cheung – 2012-04-25T05:28:56.560
@JackeyCheung Typical NTFS allocation unit size (desktop): 4 KByte. Minimum size on disk for empty file: MFT record (one allocation unit, 4 KByte). Trillion files at 4 KByte each, (approx 1,000 * 1,000,000,000), 4 Petabytes... ... ... by the way, typical server allocation units (when drives are larger) are greater than 4 KBytes. So with a trillion files, you should be using at least 4 petabytes unless you set a smaller allocation unit size when formatting. I can see why it's slow/why it crashes. – Bob – 2012-04-26T02:47:46.710
@JackeyCheung In all seriousness, though, you're going to have some serious MFT size growth if you have a whole lot of files being created and deleted. – Bob – 2012-04-26T02:48:43.323
It might be worth booting to the Windows install DVD (or a Windows PE DVD / USB stick if you're so inclined) and trying the delete from there. – Harry Johnston – 2012-04-26T04:59:18.510
Is this an NTFS volume? (If it is a volume with a third-party file system, this would explain both how you fit so many files on one volume and why you're running out of memory.) – Harry Johnston – 2012-04-26T19:24:19.673
2A trillion files needs a file table that is 1 PB. The biggest hard disks nowadays are a few TB. How did you possibly get a partition that big? – user541686 – 2013-10-15T22:01:12.973