Why is it so hard to search for files using Spotlight?
Spotlight sometimes misses files, namely those tucked away inside bundle structures or system directories. Unfortunately, because Spotlight is designed to optimize general searches, it falls short when you’re trying to do specific searches.
Find File, Sherlock, and Spotlight
You might remember that back in System 7.5 (1995), Apple introduced a Find File utility which allowed one to search a volume by file name or other attributes. It was very fast, thanks to a specific design characteristic of the Mac’s native file system: HFS+ stores its catalog data in a centralized manner, instead of distributed across directories. This approach makes the catalog data more susceptible to corruption, but it also makes file searches very fast. There is no need to index file names, because the file system is inherently indexed.
With Sherlock, introduced in Mac OS 8.5 (1997), Apple started moving toward the find-by-content problem space. Spotlight, introduced in Mac OS X v10.4 (2005), represents Apple’s latest solution to the problem of search. The underlying indexing technology used by Spotlight is an evolution of Apple’s SearchKit technology (in turn based on Apple Information Access Toolkit, a.k.a. VTwin), but the Spotlight user experience is a completely new approach, based on the idea of a single search field that applies to all content by default. It’s basically the same thing as what Google calls universal search, but Spotlight obviously applies the idea across the desktop operating system instead of the Internet.
The patent that covers Spotlight, 6847959 (“Universal interface for retrieval of information in a computer system”), was actually filed in 2000, revealing that Apple had been working on universal search for several years before Spotlight was ever introduced. Apple’s first implementation of universal search was actually in iTunes (2001).
Spotlight’s strength, however, is its greatest weakness. If you know exactly what you want to search for, it takes a bit more work on your part to set up a filename-only query (and you can’t do it from the Spotlight menu). Even so, many system directories and temporary files are omitted from the search as a performance optimization, which can pose a real problem if those are the files you’re looking for. This effectively also means that the search results are not guaranteed to be complete. In practice, we’ve gone a step backwards from the simple, reliable Find File of System 7.5.
find and locate
There are also two standard Unix command-line tools which search the file system. The first command-line tool is find. In additional to its awkward interface, find performs a walk of the directory hierarchy because it doesn’t know about HFS+. This makes find slow, although it provides more complete search results than Spotlight.
The second command-line tool is locate. Unlike find, locate uses a index, so searching is fast. Unfortunately (and unlike with Spotlight), the index is only updated automatically when the /etc/weekly script runs, which isn’t guaranteed to happen, especially if your computer tends to be asleep.
FSCatalogSearch()
It’s not obvious, but the functionality of Find File does still exist on Mac OS X. At the heart of Find File is a Carbon File Manager function called FSCatalogSearch. You create an iterator and pass in a struct of search options, and FSCatalogSearch uses that iterator to search until the iterator is exhausted.
Here’s a simple command-line tool which uses FSCatalogSearch to search for files by name.
usage: findfile [-r <volume>] <name>
- Binary: findfile (46K, Mac OS X v10.4)
- Sources: findfile.zip (5K)
Comparison
The best technology for finding files by name, as measured by both completeness of results and performance, is FSCatalogSearch. Ironically, it does not have a visible interface in Mac OS X v10.4. The best approximation is to set up a Smart Folder which searches by Filename or even Raw Query.
mdfind is a command-line interface to the central metadata store that underlies Spotlight. Perversely, mdfind offers better results than the Finder interface to the same technology. The only difference between mdfind and FSCatalogSearch is that mdfind does not enter bundles, so that /Applications/iSync.app/Contents/Resources/orangelight.tiff, for example, won’t be found.
| Hits | Time | |
|---|---|---|
| FSCatalogSearch (“findfile orange“) | 29 paths | 4 sec |
| find (“find -x / -iname "*orange*"“) | 29 paths | 1 min |
| mdfind (“mdfind -onlyin / "(kMDItemFSName =='*orange*'c)"“) | 25 paths | 4 sec |
| locate (“locate orange | grep -v /Volumes“) | 4 paths | 0.2 sec |
| Spotlight (“Filename contains orange” on startup disk) | 3 paths | 5 sec |
P.S. Here’s a tip. You can actually enclose your search terms in quotation marks:
“example”
Spotlight will limit your search by name. (This only works from the Spotlight menu.)
Has anybody come up with a GUI for FSCatalogSearch? I dearly hope this will happen soon! 10.4’s Spotlight is USELESS to me as I do a lot of data recovery work, and often have between 4 and 6 volumes mounted over a network which all require searching. Spotlight cannot index a failing hard disk (and allowing it to index the hard disk will probably kill the hard drive!) but 10.3 Find File (which utilises the existing HFS+ directory index) works a charm.
I wonder just how many people don’t realise Spotlight is not providing accurate results. As per your example, there is a ENORMOUS difference between finding 29 files and 3. In my line of work, ignoring the 26 files that Spotlight forgot about could mean losing 5 years worth of company records, losing photos of a child’s birth or a dead relative, or 10 years worth of irreplaceable scientific data. None of these options are acceptable, yet strangely the problem never existed before 10.4 arrived. I remember back in 1992 using my LC475 to search network volumes was always fast and accurate, so why this seemingly simple task has become impossible in 1997 is beyond me.
I would like that graphic interface for FSCatalogSearch as well. I’m using DevonThink’s EasyFind in the meantime but it’s too slow.
I am another Spotlight disliker (the indexing is always getting messed up and requires too much babysitting).
Tried your findfile, it works in Leopard. Well, is it possible to have more search options? e.g., “date modified” will be most welcome.
Thanks,
fab
@fab:
Have a look at hfsfind!
http://codesnippets.joyent.com/posts/show/2090
Hi, just ran into this by accident.
David Greenland asked for a GUI app using CatalogSearch. I’ve made that a while ago: Find Any File.
See my weblink.
I made one, too: http://www.macupdate.com/info.php/id/30073/find-file
Ahaaa. That’s why your name looked so familiar :)
I was surprised that, after releasing my app, it took only a day or two until yours showed up, too. Seems like you had it cooking for a long time, just like me. It was about time, couldn’t believe that all the the find tool never got that managed and instead did slow recursive searches.
Cheers!
Thomas