Searching through a batch of PDFs? Index them!

Subject: Searching through a batch of PDFs? Index them!
From: "Geoff Hart" <geoff-h -at- mtl -dot- feric -dot- ca>
To: TECHWR-L -at- lists -dot- raycomm -dot- com
Date: Tue, 12 Oct 1999 10:11:35 -0400

Darren Barefoot has a <<...growing number of technical
bulletins on our external Web site. [in PDF format] Currently,
they're in one big long page... and, in an ideal world,
I like to provide the user with a search field on a Web page
which would allow them to perform a full text search through
all of the PDF files.>>

Although I can't help you find a search tool that will solve
your need, I do have a far better suggestion for you (since I'm
one of those folks who finds that search engines are rarely
worth my time): index the PDFs. The problem with all full-
text search engines is that they're simply not context-sensitive
(e.g., they can't tell whether "Report Manager" refers to the
person who manages your reporting division or the software
that lets users create reports). And they won't be context-
sensitive anytime soon.

In marked contrast, an index is something created by a
human who has empathy for how users will try to locate
information. By picking a few carefully chosen keywords for
each PDF file, you can greatly facilitate the task of finding
the best PDF file for a user's particular need. And even if you
don't automate the process of managing the index, it's not a
particularly painful task to maintain the index manually. For a
few more thoughts on this topic and a better exploration of
my rationale, check out my article "Index the Web"
(Intercom, June 1999, p. 26-28). One update to the article:
HTML Indexer, which I mentioned almost dismissively as
'promising', has been updated substantially since the time
(almost a year ago) I first saw it and mentioned it in my
article; the publication timelag is such that the product has
improved substantially beyond what I mentioned in the article
and is now a valuable production tool.

--Geoff Hart @8^{)} geoff-h -at- mtl -dot- feric -dot- ca (Pointe-Claire, Quebec)
"Perhaps there is something deep and profound behind all those sevens, something just calling out for us to discover it. But I
that it is only a pernicious, Pythagorean coincidence." George Miller, "The Magical Number Seven" (1956)

