why don't file search utilities just load and parse the MFT?

4

1

I've noticed when I search for a file by name (in Windows or Linux) it's typically a disk-intensive process, especially in Windows. It seems that the utility (Windows Search, or "find" in Cygwin) scans the entire directory tree, considering each file one by one.

I'm wondering, why not load the Master File Table (or equivalent, if not NTFS) into memory and parse it purely in memory? I suppose that's similar to the indexes maintained by more modern search like Windows Search, Google Desktop Search, and Spotlight, but even those are indirect. I guess filesystems don't normally make their metadata available to external programs?

I can't prove that the search isn't already based on the MFT, but it seems unlikely based on how it runs.

Stephen

Posted 2013-07-22T02:32:09.057

Reputation: 635

Question was closed 2015-05-27T18:55:33.107

2filename searches themselves are usually quite quick. its when you are getting into content that the searches take longer. not quite sure what you are seeing in terms of inordinate IO. – Frank Thomas – 2013-07-22T03:04:01.120

For example in Cygwin if you navigate to /cygdrive/c/ and run find ./ -name Desktop.ini it'll gradually output incremental results over many minutes. Most of the activity is "I/O Other" which I assume corresponds to metadata including file names. My concern is that it appears to be recursively looping over the directory tree, such that each "list directory" commnad is a separate query to the HDD over the computer bus. Wouldn't it be theoretically much faster to load the MFT into memory in large chunks for processing, eliminating that latency from the bus? – Stephen – 2013-07-22T04:58:13.577

Especially since the MFT is usually spread over very few fragments (<10 almost always) so you could mostly read sequentially. – Stephen – 2013-07-22T05:01:18.967

In windows, do you have millions of tiny file items? are your searches ending with display of icons thumbnails or anything other than File size, File attributes, date and location? is any content searching occuring? I can search through terrabytes with >200k files in ~10 seconds using 3rd party util. If I avoid the system and programs stuff,and only do the user 60K files a few seconds, especially after it gets cached. no indexing. The disks are defrag sorted Directores first (alpha) and files after (alpha). I had started that practice of re-order defragging back when things were slow – Psycogeek – 2013-07-22T09:41:18.343

I just checked old winXP dog search on the "normal" computer, takes less than 2 minutes to weed through 6T in ~250K files. some of it is re-order sorted but not all. What are the specs of the machine, type of hard drive(s) speed of drives, and how many file items are you searching through? is there any Local Net , mapped drive stuff in your search? – Psycogeek – 2013-07-22T10:00:08.180

Answers

4

There are programs that will search using the MFT on Windows NTFS volumes, e.g. open source projects:

http://sourceforge.net/projects/swiftsearch/

http://sourceforge.net/projects/ntfs-search/

They're VERY fast but the problem is that once you start going straight to the MFT you by-pass functionality such as security ACLs and shell extensions. Therefore most of these programs need to run with elevated permissions and don't necessarily produce the same results as an API based search.

snowdude

Posted 2013-07-22T02:32:09.057

Reputation: 2 560

1

Tools like Everything and UltraSearch also parse the MFT (and in some cases the USN Journal) directly. However, sidestepping ACLs is the #1 reason you don't see more use made of this.

– afrazier – 2013-08-22T13:07:05.273