Does Spotlight re-import a file that has been copied or moved?


This isn't a trick question about corner cases or anything like that -- simply, if Spotlight already has a file imported and indexed into its database and that file is moved or copied (say, a script used mv or cp on it), does Spotlight compare checksums (or something like that) to determine 'this is the same file, no need to look inside of it again', or does it just call whichever importer is registered for that type again (which will trigger a re-parsing).

If possible, please provide the source where you learned this from; the thing is, I really want to be sure which one is the specified behavior (its too easy to be misled by experimentation with such a fickle system as Spotlight).

As background, the idea is that I have a custom Spotlight importer which is quite time-consuming (in terms of how long it takes to import a file) and I want to know if it is safe to write a shell script that moves files that it indexes around in/out of Spotlight-enabled folders without having to worry about causing Spotlight to go crazy reindexing everything.

Adrian Petrescu

Posted 2010-01-21T04:20:20.953

Reputation: 2 728



To find out if the reindex happens with mv/cp. You can keep an eye on it with fseventer. It observes filesystem changes using the same underlying API as Spotlight


Posted 2010-01-21T04:20:20.953

Reputation: 1 808


It basically doesn't need to reindex, since any file I/O that goes through the kernel, including UNIX mv and cp commands, causes an update in the Spotlight index. That way, the index is always up to date.

Spotlight has definitely come a long way since then, but when 10.4 Tiger came out, there were a few good articles floating around about how it works, explaining the technology. One such article is at the comprehensive Ars Technica Tiger Review. Quote:

Pre-created indexes allow for very fast searches, but they also preset a potential problem. It's easy for an index to get out of sync with the current state of the file system, and an index that's out of date is not very useful. In order to provide accurate results, an index must accurately reflect the state of all files "in real time."

Each metadata importer is responsible for scanning a file and returning all of the metadata it could extract—from the file system metadata structures, the file contents, or anything else it wants to consider. The metadata is returned as a set of key/value pairs, and is added to the Spotlight index entry for the file.

Metadata importer plug-ins are stored in Spotlight folders in any of the various Library folders. As usual, the more specific locations take precedence: ~/Library/Spotlight overrides /Library/Spotlight, and so on.

Any file i/o that goes through the Tiger kernel will trigger the appropriate metadata importer. This kernel-level integration ensures that the Spotlight indexes are always up to date.

Hope this helps. Others may be able to shed light on Spotlight architecture improvements since Tiger came out. I am but a humble user.


Posted 2010-01-21T04:20:20.953

Reputation: 13 618

That's very interesting, fideli -- I had no idea OS X was so conservative as to re-index on every kernel I/O operation.

I'm prepared to accept your answer, pending two things that I want to look up (or if someone else knows :) (1) Does the reindex that happens on operations like mv/cp re-index the whole file, or just the metadata that reasonably changes, like location, access times, etc? (2) Is this behaviour still the same in Snow Leopard? I know that many Spotlight performance improvements were touted for Leopard and this sounds like the kind of thing they would change.

Thanks! – Adrian Petrescu – 2010-01-26T22:13:19.043

Hopefully someone can add to my answer as I've exhausted my knowledge of Spotlight by now. – fideli – 2010-01-27T00:55:24.037

Thanks for the start :) I've added a bounty to the question to possibly attract some answers to the questions you raised, but if it doesn't work, I'll just accept yours. Cheers! – Adrian Petrescu – 2010-01-27T23:44:26.793