nullstream weblog - Abandoning Google Desktop Search

« Run Host! RUN HOST!!! | Gone With The Blastwave »

Abandoning Google Desktop Search


July 5, 2006 11:07 PM PST

There is kind of a funny, sick feeling you get when you find out that something isn’t exactly what you believed it to be. I had that experience recently with my desktop search. I was lulled into thinking that once you installed a desktop search utility your days of hunting through piles of files was over. I mean there is just something about the name ‘Google’ that inspires confidence in search results. If Google says I don’t have a local file with that keyword, who am I to argue? Google must be right – that is unless you are actually staring at that exact keyword in one of your files.

It turns out that Google only does a partial search of your documents. What that means is that while it may index all of your (supported) documents, it does not index the entire contents of those documents. A quick dig of their help site reveals that they only search the first 10,000 words or less of each doc. In my tests it was indexing far less than that and only covering the first 10 pages or so of the documents I'm trying to search. The files I am interested most in searching are far longer than 10,000 words anyway.

The real rub here is not that they are taking some shortcuts to save search time and index size. I'm sure that even with this limitation they are hitting 80% success with 80% of users. The real issue is that no one knows about this limitation. It is not mentioned in any obvious place at least. And if that were not bad enough – it is not configurable. It is also not mentioned in any of the major desktop search reviews or comparisons – at least not any of them I read over the last year. I only found mention of it when I started searching for this specific problem after bumping into it.

I have uninstalled Google desktop search and installed Windows Desktop Search (without the MSN toolbar). It turns out Microsoft's offering also only does a partial search, (sigh). The difference is that they index the first 1M of long files (again not configurable that I can see). That is large enough to cover my current use case, but I'm not happy about it. My guess is that all the current desktop search solutions are probably limited in some way or another. And I'm sure none of them publicly draw attention to their limitations. The morale of the story is; don't blindly trust what your desktop search engine tells you. And don't throw out XP's built in search crawler yet - it may be slow but sometimes slow is ok, when you want the job done right.

Comments (0)

All links will be marked with the nofollow tag, making them useless for search rankings. Any posts containing spam URLs will then be deleted.