search engines for PDF&HTML

Subject: search engines for PDF&HTML
From: "Smith, Ellen" <SmithEl -at- RF -dot- SUNY -dot- EDU>
Date: Thu, 19 Mar 1998 11:33:30 -0500

Some requested that I post the responses to my recent question about search
engines for PDF and HTML. My original post is below, and each response
follows, separated by asterisks. Thanks to everyone - we are currently
looking into the Lycos product.

Ellen Smith
SUNY Research Foundation
smithel -at- rf -dot- suny -dot- edu
This summer my organization is going online-only with all our
documentation. Our documentation consists of PDF versions of our hard copy
manuals and HTML documents for two new manuals. Our users will be
the documents through a corporate intranet.

Our current search engine for the documents, which was written
in Visual Basic (which we no longer have access to) is cumbersome to
update and is not working properly.

My question for the group is what search engines or strategies
have people employed for these type of documents? Any and all help

If you go all-HTML, you can use a commercial engine like Alta
Vista or Excite. If you go all-PDF, the Reader has some (basic) search
capabilities. If you stay with a mixed format, I think Verity's
Search97 can handle PDF's. Refer also to Edmund X. DeJesus's article
"The Searchable Kingdom" in the June 1997 issue of BYTE.

If you are running an NT-based Web server, you could look at Microsoft's
Index Server. It's a free download from their Web site.

I just installed it on our departmental Web site. It's moderately complex to
configure (harder than the Exciite Web Search) but I was able to set up
query and result pages in about a day. It appears to be quite customizable,
but customizing it beyond the basics requires some digging into
files and steady nerves.

Adobe has a free filter that allows you to search PDF files, which is why I
switched from Excite to Index Server.

I've never used any of them, but for searching PDF files through an
intranet, a list of Adobe recommended tools can be found here:
while i can't give you any information, the following site's project
does just what you are asking for: searchable pdf files. hopefully, you
can get some information through them.
the site is roger black's company, interactive bureau.
in addition, this is a really beautiful, but very expensive project.
Lycos just released some software that allows users to search intranets for
docs in HTML, PDF and other file formats. It's free, so it's worth
checking out.

Excertps from the press release follow:

Free ?Lycos Site Spider? Software From Lycos, Inmagic
Webmasters now have Intuitive Software to Make Searching within a site

Spring Internet World, Los Angeles, CA, March 11, 1998, Lycos, Inc., ?Your
Internet Guide,? (NASDAQ:LCOS) today announced, together with Web
search/retrieval software developer Inmagic, Inc. the free distribution of
a revolutionary Web site search product, the Lycos Site Spider. The new
product, a fast, easy-to-deploy tool that combines the best of breed
technologies from the two companies, lets anyone easily spider their own
Web sites and embed a search mechanism to help visitors find information
quickly. Visitors can search the site for words or phrases, for example,
instead of just clicking hypertext links or navigating a site map. The
Lycos Site Spider is available for downloading from today.

The Lycos Site Spider provides intra-site search capability by gathering
and indexing information from Web site pages, following links in HTML code
and importing information into Inmagic's text database. The resulting
searchable catalog, accessible via a Web browser, contains document content
as well as information such as URL, dates and file size.

?Customers were asking for a quick and easy way to spider a Web site and
that's what the Lycos Site Spider does," says Phillip Green, president and
CEO of Inmagic, Inc. Having already successfully paired Lycos and Inmagic
technologies to spider corporate intranets, it made sense to do the same
for public Web sites.

Highlights of the Lycos Site Spider include:
? Unique Java version Word Wheel index browser that eliminates
trial-and-error searching and utilizes advanced index streaming technology

? Spider full text or abstracts
? Multiple views of search results
? Real-time indexing
? Highlighted search criteria

Technical support available
Users will have the option of Technical Support for the Lycos Site Spider,
unlike other free site spiders. Inmagic will provide support on a per
incident basis, with a $100 fee per call.

Lycos-Inmagic on corporate intranets
Lycos and Inmagic have previously teamed to offer the DB/Text Intranet
Spider, designed for use on corporate intranets. The DB/Text Intranet
crawls HTML and non-HTML documents, including PDF files and other popular
application formats such as Microsoft Office, Lotus 1-2-3, WordPerfect and
others. The spider follows HTML links and walks the file directory
on Windows NT intranet servers, as well as UNIX and other servers connected
NT servers throughout an organization.

The DB/Text Intranet Spider features sophisticated customization options
site administrators, including the ability to:
? Customize search screens and search results views for users in a WYSIWYG
drag-and- drops environment. Users can choose how to view results from
multiple formats.

? Enable users to search multiple fields simultaneously (e.g. user can
documents by keyword, date range, and author simultaneously)

? Completely customize the spider catalog (deleting URLs and otherwise
spider results) via the fully editable text database back-end

? Add meta data (document concepts, department, project client, etc.) to
document index to further classify documents on the intranet. The ability
search on meta data increases the precision of search results.

? Re-spider only new or changed documents

Inmagic and DB/Text are registered trademarks of Inmagic, Inc. Lycos is a
registered trademark of Carnegie Mellon University. All other trademarks or
tradenames are the property of their respective holders.

Previous by Author: search engines
Next by Author: Re: Writing Procedures with ISO 9000 Standards
Previous by Thread: HCI/GUI Newsgroups
Next by Thread: Re: search engines for PDF&HTML

What this post helpful? Share it with friends and colleagues:

Sponsored Ads

Sponsored Ads