How do I tune windows server 2012 R2 to handle NTFS file structure with 50 million files?

Question

I have a developer utility that I will use to generate 50 Million files. The directory structure goes four levels deep. The top level contains 16 directories (years 2000-2016), next level - months (1-12), next level - days (1 - 31) and then finally - xml files (up to 85k each). The final directory could have 3000+ files (I haven't done the math to figure out how 50 Million will fit within that directory structure).

I am currently running the utility and I'm about 1/3 of the way through (days to execute). As I feared, traversing any part of the directory tree is a painful experience. Takes several seconds just within explorer. This with server grade hardware. SAS 7200RPM (I know this isn't fast nowadays) 12 terabyte Raid 5 or 10, allocated with 4 3.4ghz xeon cpus.

How do I increase windows server 2012 R2 ability to cache file handles in memory? I do not have the NFS service running.

M:\>defrag /a /v /h m:
Microsoft Drive Optimizer
Copyright (c) 2013 Microsoft Corp.

Invoking slab consolidation on DB MDF (M:)...


The operation completed successfully.

Post Defragmentation Report:

    Volume Information:
            Volume size                 = 12.99 TB
            Cluster size                = 64 KB
            Used space                  = 1.55 TB
            Free space                  = 11.44 TB

    Slab Consolidation:
            Space efficiency            = 100%
            Potential purgable slabs    = 1

M:\>

C:\Windows\system32>fsutil fsinfo ntfsinfo m:
NTFS Volume Serial Number :       0x9c60357c60355de8
NTFS Version   :                  3.1
LFS Version    :                  2.0
Number Sectors :                  0x000000067ffbefff
Total Clusters :                  0x000000000cfff7df
Free Clusters  :                  0x000000000b6bcb45
Total Reserved :                  0x0000000000000004
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096
Bytes Per Cluster :               65536
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000320900000
Mft Start Lcn  :                  0x000000000000c000
Mft2 Start Lcn :                  0x0000000000000001
Mft Zone Start :                  0x00000000018f8780
Mft Zone End   :                  0x00000000018f9420
Resource Manager Identifier :     A47067E0-6356-11E6-8

C:\Windows\system32>

Rammap

metafile details: Total=2,882,220 K, Active=2,736,688 K, Standby=143,968 K, Modified=852 K, Modified no write=712 K.

What else would be of interest on this page?

At this time the server is allocated 16G of memory. I could ask for a lot more.

C:\Windows\system32>fsutil.exe 8dot3name query m:
The volume state is: 1 (8dot3 name creation is disabled).
The registry state is: 2 (Per volume setting - the default).

Based on the above two settings, 8dot3 name creation is disabled on m:

C:\Windows\system32>

Contig v1.8 - Contig
Copyright (C) 2001-2016 Mark Russinovich
Sysinternals

m:\$Mft is in 80 fragments
m:\$Mft::$BITMAP is in 32 fragments

Summary:
     Number of files processed:      2
     Number unsuccessfully procesed: 0
     Average fragmentation       : 56 frags/file

NtfsInfo v1.2 - NTFS Information Dump
Copyright (C) 2005-2016 Mark Russinovich
Sysinternals - www.sysinternals.com


Volume Size
-----------
Volume size            : 13631357 MB
Total sectors          : 27917021183
Total clusters         : 218101727
Free clusters          : 184577826
Free space             : 11536114 MB (84% of drive)

Allocation Size
----------------
Bytes per sector       : 512
Bytes per cluster      : 65536
Bytes per MFT record   : 0
Clusters per MFT record: 0

MFT Information
---------------
MFT size               : 16210 MB (0% of drive)
MFT start cluster      : 49152
MFT zone clusters      : 33255616 - 33258848
MFT zone size          : 202 MB (0% of drive)
MFT mirror start       : 1

Meta-Data files
---------------

I forgot to mention that allocated memory should not be considered a limit. I believe the hosting server has 128g + available. — D-Klotz, Aug 18 '16 at 18:56
There are already a number of potentially interesting Q&A's - for tuning NTFS: http://serverfault.com/questions/46881 - for debugging performance issues: http://serverfault.com/questions/768755/ — HBruijn, Aug 18 '16 at 18:57
Probably help to include the report from a `defrag /a /v /h x:`, and if you have 8Dot3 file name and NTFSLastAccess disabled. — Greg Askew, Aug 18 '16 at 19:11
Pulled from regedit on the server-> NtfsDisable8dot3NameCreation:2, NtfsDisableLastAccessUpdate:1 — D-Klotz, Aug 18 '16 at 19:20
I don't HAVE to use NTFS. Is ReFS more suited for an insane number of files and directories? — D-Klotz, Aug 18 '16 at 20:36
I don't know but in the scheme of things 50m files and directories is very small - far from insane - though systems with far larger numbers tend not to be Windows based. — Chopper3, Aug 19 '16 at 08:17
I've used app that have multi-million files in a single directory; apps can run fine as long as there's meta data outside to pull the files; if you're doing directory scans you're using the file system as a database and that won't scale. In my case, yes windows explorer was unusable, it would flashlight forever if you tried to browse the directory, but the app was fine. — SqlACID, Aug 19 '16 at 10:12
Can you provide the output of `contig.exe -a x:\$Mft`? https://technet.microsoft.com/en-us/sysinternals/contig.aspx — Greg Askew, Aug 19 '16 at 10:55
Also `ntfsinfo.exe x:` https://technet.microsoft.com/en-us/sysinternals/bb897424 — Greg Askew, Aug 19 '16 at 11:27

score 6 · Accepted Answer · edited Apr 13 '17 at 12:14

You currently have an MFT of 0x320900000 = 13,431,209,984 bytes = 12 GiB in size, with only 2GiB of that in memory. More RAM will allow you to have more of that in memory, if you want to cache more of the "file handles" aka file system metadata.

No matter what filesystem you use, there will be metadata, and depending on filesystem usage patterns you may be better off investing in more ram AND/OR faster storage. If the amount of metafile information is unrealistic to store it all in RAM and your file usage patterns are typically dealing with new files instead of repeatedly using a smaller subset of files, then faster storage like raid 10 arrays with many mirror pairs to stripe across made from faster SSD and/or 15K RPM SAS disks, may be needed to limit the seek time and increase the amount of available IOPs the storage can handle.

Keep in mind that Windows memory manager's default settings may not apply to your situation and you may need to tweak some settings, particularly if you're not planning on having enough RAM to fit the whole MFT in RAM in addition to what the rest of the system requires. I notice that nearly all of your metafile data is marked as Active memory, meaning the Windows caching system is not allowed to discard it out of RAM when it is not being used. My powershell script on Windows Server 2008 R2 Metafile RAM Usage can be used (even on Server 2008 to 2012R2, and I expect 2016) to set minimum and maximums on the amount of metafile memory that is marked as active, and forcing the rest to be standby. This allows the cache system to prioritise what is in RAM better.

Edit: While I'm not familiar with jmeter, it sounds like the filesystem usage pattern is going to be

write them all in a sequential manner.
read them all as fast as it can in a mostly sequential manner
read them all a second time in a partly random pattern (as each thread competes to read the group of files it wants) to send them over the network

With that usage patter, to see a reasonable benefit of adding a LOT more ram you would need to add enough to fit the whole MFT in RAM. This is generally a waste of money. When it would be more cost effective to add a bit more RAM, and to upgrade the storage to significantly improve the IOPs. This should be faster than keeping a slow 7.2K rpm disk raid5 array, or even a raid10 made with only 4 disks with colossal amounts of ram, as the metadata is not the only data being read/written from/to storage. See this calculator as an estimation tool on expected IOPs performance, and how different number of disks and raid levels affect performance.

In the above case, the only way that adding even more ram can be faster than using a system with faster storage, is if you add enough ram that all data, including file content will be in ram as well. This is why some database systems advertise that they operate "100% in memory" so that there are no storage system delays.

Thank you for the reply. That script is relevant for server 2012 R2 - correct? The pattern you see now is because I'm using a utility program we wrote to generate xml records (random internal data). Once that completes I'm going to "try" and use Jmeter to scan that tree, hold file paths in memory, and then fire off a configurable number of threads to systematically send those files to REST Endpoints on a completely different server. I can do this with 10K or 1M. I doubt I'll be able to with 50M. How the file system caches for this last phase will be key. — D-Klotz, Aug 19 '16 at 02:34
As I read the comment - I'm mixing two different problems. The feasibility of processing and sending 50M records and the performance of the file system during the initial scan of Jmeter. For the purpose of this question, lets stick with the last problem. — D-Klotz, Aug 19 '16 at 02:36
Excellent advice. I'm going to ponder this. I can add significantly more RAM but if ram cache must contain the file contents as well - then I will be screwed. I had hoped that the system meta file memory cache worked with a "file handle" or "reference" to the actual location on disk and that would be enough to make the system usable. From reading your post, that isn't how it works. — D-Klotz, Aug 19 '16 at 14:50
Adding more RAM would help some. A 16 GB MFT is getting up there. Note that directories and files less than 1K are created directly in the MFT. I wouldn't have high performance expectations of using Windows Explorer in any scenario though, particularly with spinning disks. — Greg Askew, Aug 19 '16 at 14:55
Well I will forego using windows explorer. I'm concerned whether a program such as Jmeter (Java application) will operate with any modicum of speed given the pattern that @BeowulfNode42 so accurately identified. He's spot on. — D-Klotz, Aug 19 '16 at 15:00
@D-Klotz: I would be interested in seeing the current IOPS and sequential/random read/write statistics for your storage system. It sounds like an underperformer. — Greg Askew, Aug 20 '16 at 13:48
@D-Klotz I don't know about best, but you could try https://sourceforge.net/projects/iometer/ to simulate usage and measure results. The built in performance monitor tool in windows can also provide useful info on performance info during live usage. — BeowulfNode42, Aug 22 '16 at 00:45
Just build the XML dynamically in your JMETER script. The overhead of anchoring your virtual user code to a disk tree traversal of this complexity will slow down every process on your load generator (especially your virtual users). Disk I/O is ring 0. Ring 3 processes (application stuff like VUs) have to yield priority to Ring 0. Your small ring 3 CPU and RAM to build the XML dynamically will have sig less overhead than anchoring to disk. — James Pulley, Sep 12 '16 at 14:48

How do I tune windows server 2012 R2 to handle NTFS file structure with 50 million files?

1 Answers1