Google Desktop Search: Indexing Firefox’s cache
Google Desktop Search is great, but its first iteration has several important drawbacks.
- It does not index PDF files.
- It does not index network drives.
- It does not index Mozilla Firefox’s cache.
I don’t know how to remedy #1 or #2, but thanks to Mike Fioritto (and originally to Jon Udell for popularizing the workaround), I now know how to address the third problem: Slogger. Slogger is a flexible Firefox extension written by Ken Schutte that will automagically create an archive of every page you visit. After the page loads, Slogger will save it to your disk as a complete HTML archive, HTML only, or text file. This makes Google Desktop Search happy, because it knows what to do with HTML files and text files.
I do have a couple of words of advice for using Slogger most effectively, however. The first thing to do is to read the help; this extension is not quite as intuitive as most you’ll encounter. Slogger is more flexible and customizable than many Firefox extensions, and unfortunately, out-of-the-box it’s not configured optimally. Here are my suggested settings, once you’ve got it going:
-
Save pages as “Web Page, HTML only.” [Save Pages tab]
By default, it saves “Web page, complete,” which archives all the extra crap (images, javascript, etc.) I don’t think this is all that useful, especially if you plan to use Slogger in conjuction with GDS and not as an archive to live on in perpetuity. Complete Web archives take up a lot of disk space quickly, and they’re useful only for local archiving. You can always load the HTML-only page in a browser connected to the Internet, and it will go fetch the images and whatnot from the server—provided they still exist.
-
Change the variable-based “File name to use for saved page” [Save Pages tab].
By default, a time-based file name (including milliseconds!) is used. This is not useful for humans and basically worthless for GDS, too. The <title> of the page adds context for GDS and human comprehensibility. Try this:
$title [$year-$month-$day $hour-$minute].html -
Tidy up the “Entry into Log File for each page” [Log File tab]
The default setting uses too many lines per page, I think. Try this:
<p>
<b>$title</b> (accessed $year-$month-$day $hour:$minute | <a href=”data/$htmlfile”>local copy</a>)<br />
<a href=”$url”>$url</a>
</p> -
Don’t save Google searches (GDS or Google.com) [Filters tab]
Saving these searches, especially local GDS searches, simply adds to the clutter of the Slogger archive. Add “127.0.0.1″ and “www.google.com” to the “Block URL if host (server) is in the following list” area.
However, lest I come off as negative, I think this is a great extension that compensates tidily for a significant failing in Google Desktop Search.
October 24th, 2004 at 3:36 am
Great tips. Thanks! I use Slogger too. I don’t use Slogger as a caching tool, but more as a tool to save pages I would like to read/review thoroughly once. So I am waiting for a next release where you can output RSS-feeds from your cache. That way, I can load it into my reader and have a “To read/review” section.
March 2nd, 2005 at 8:21 pm
links for 2005-03-03
blo.gs: for sale sporting a 5:1 cost/earnings ratio (categories: blogs) NAV - dynamically-generated web navigation as seen at math.rochester.edu (categories: interweb ui design organization nerd ruby architecture) Viewing Browser Cache in Firefox abou…