3

Windows Server 2003 SP2 LUN mounted from SAN Millions of small files across hundreds of thousands of directories (100GB total) NTFS with 4k cluster size

While doing the initial file crawl for backups or archiving regular user access to files on this drive is severely slowed.

SAN and network guys show no abnormal activity in initial investigations, but deeper investigations are continuing. Some sort of server level issue with NTFS or Windows is suspected.

Given that almost all files are <10k so fit within 1-3 clusters I do not suspect regular fragmentation to be an issue, but perhaps MFT fragmentation could be. Given that backups and cleanups cause user disruption even off hours I hesitate to use windows defrag to analyze my fragmentation and I really care just about MFT fragmentation. Any was to figure just that out more quickly than a full disk analyze?

Well behaved 3rd party defrag programs are not out of the question either if anyone has recommendations. Not disturbing the users further with our analysis is a big priority.

We are also considering putting in the reg key for NtfsDisableLastAccessUpdate. Has anyone found this to truly be a big improvement and not just a minor tweak?

Are there any good tools to measure file locking/access contention for a busy drive? GUI tools from sysinternals like procmon don't scale at that level any more.

ss2k
  • 275
  • 1
  • 2
  • 10
  • I'm waiting more information from other teams monitoring. The disk partition is already aligned. – ss2k Feb 11 '10 at 14:30
  • Defragmenter Analysis showed 15GB MFT on a 500GB volume. The MFT was 99% in use with over 15 million records but only 2 fragments. This seems odd. I would have expected more fragment. I'm getting the idea there's internal MFT fragmentation within the 2 blocks that is just not exposed in this summary. – ss2k Feb 12 '10 at 22:55

3 Answers3

4

When you are backing up a volume like that you are going to be seriously exercising the underlying storage. When you start reading in those millions of small files scattered around the filesystem the limiting factor is going to be the random read IOPS that the underlying disks on your SAN can deliver. The SAN itself might not be stressed at all but the volume you are reading from will be taking a hit and any other process that tries to do anything else at the same time will suffer unless you throttle back the backup activity.

The thing to look at is the queue depth on that volume. If it's peaking significantly higher than the number of disks that back up the volume then you are hitting the IOPS limits. Perfmon will give you an idea but the best data will come from the Storage Array's own analytics if it's possible to get those. I seriously doubt that your problem is anything to do with file locking. The Storage folks need to look at the IOPS on the disks in the RAID pack that your volume is carved up from, I suspect that those disks are hitting over ~ 150 IOPs each (higher if they are 15k, lower if they are 7.2k). If you have a 6 disk RAID 10 group hosting that volume then it will max out at a rate not much better than 10Meg/sec if it is genuinely backing up millions of 10k files and very little that is much bigger.

NtfsDisableLastAccessUpdate will help in your case - it drops a set of IOPS from each file activity and in particular it avoids a couple of extra reads and writes associated with each file. Given that you have millions and that your files or so small there should be a significant win, it may be as much as a 50% win. That said the most likely effect you will see is that your backup will speed up, but still run into an IOPs limit.

You should also consider aligning the disk partition if that hasn't already been done. In a case like this (lots of small reads) it isn't as big a win as it can be for other IO patterns, probably about 10% assuming your RAID stripe size is 128k and your average read is around 10k, but it might be worth the effort. It will require backing up the whole volume, repartitioning and reformatting it and then restoring the data so its not a trivial exercise.

Helvick
  • 19,579
  • 4
  • 37
  • 55
1

Running the disk defragmenter in analysis mode is the only way I'm aware of to see how many MFT fragments you have.

If the volume is in use 24 x 7 then you're probably stuck "disturbing" the users. If not, schedule the "defrag -a -v C:" command to run in off-hours w/ its output redirected to a file. That will get you the command output w/o requiring you to be awake to run it at 4AM on a Sunday. >smile<

I can't give you statitics re: "NtfsDisableLastAccessUpdate", but I'd certainly set it unless you have need for the last access time to be saved. (I'd use the "fsutil behavior set disablelastaccess 1" command, rather than setting it in the registry. You do have to reboot, as well, for the change to take effect.)

You might also consider disabling 8dot3 file name creation if you don't need it, as well. Do this, especially, if you have a large number of files in a single directory. (Although turning this off won't get rid of the 8dot3 names that are already there...)

Comparing the output of "fsutil fsinfo statistics" on the volume in question before / after this slow "initial crawl" might tell you something about what's going on, albeit I think it's just going to show you that a hideous amount of metadata reads are occurring.

Do you have enough space on the SAN to do a restore of the entire contents of the volume to a fresh LUN configured similarly (w/ respect to its SAN properties-- RAID level, etc) as the production LUN? It would be interesting to see how a freshly laid-down NTFS filesystem w/ 8dot3 name creation disabled from the outset, and perhaps a differently-sized MFT zone, behaves compared to the production LUN. (It wouldn't be too difficult, either, to orchestrate a migration via copying changed files from the production LUN to this staging LUN if the staging LUN proves to function better.)

Evan Anderson
  • 141,071
  • 19
  • 191
  • 328
0

I found that the slowdown was occurring with a run of tens of thousands of files with the same 5 letters as a prefix. My going theory is that the NTFS MFT B+ trees get created too deep and unbalanced in this case. Restoring the files to a fresh disk did not repeat the issue so I suspect this is somehow balanced properly on a fresh create with the files all in a row than when the files get created randomly (sometimes one of these prefix files and sometimes not) on an already fragmented disk.

We're planning to randomize the names and also investigate disabling 8dot3 names for the future.

ss2k
  • 275
  • 1
  • 2
  • 10