Document scanning and retrieval system

Subject: Document scanning and retrieval system
From: REYNOLDSI_J -at- DT3 -dot- DT -dot- UH -dot- EDU
Date: Sat, 5 Mar 1994 04:54:08 -0600

Hello Everyone:

We want to install a document management system that uses some relatively
new technology: scanning, optical character recognition (OCR), and "fuzzy"
text search-and-retrieval capabilities. We're calling our new system a
document scanning and retrieval (DSR) system. (We *must* have acronyms
involved in whatever we do ;>).

The purpose of this system is to place a large, existing archive
(approximately 800,000 pages) and steadily incoming stream of paper
documents online. These documents are each about 200-500 pages in length;
they include many tables and line-illustration graphics along with text;
the documents are of no consistent format and range from poor to good in
photocopy quality; page orientations are both portrait and landscape
within a single document. (The originating electronic files for these
documents are generally not available, so we are forced, for now, to deal
with the paper format.)

The heart of this system, some yet unchosen software product, will help us
manage the scanning process, oversee the OCR process, and produce links
between the image files (resulting from scanning) and the text files
(resulting from OCR). This software product will also allow us to organize
the files in a coherent manner and let us attach descriptive labels, search-
able by classic data base methods. But wait, that's not all--this software
product will also let us do full-text searches of the text files, including
the text that was derived from OCR-processing of graphics and tables.

The last part is where the "fuzzy" text search-and-retrieval capabilities
come in handy. The OCR processing does not provide 100% accuracy, nor do
we have the time to do clean-up editing of the text files. So we need the
"fuzzy" search capability to compensates for text errors and let us
find the information we seek (the "fuzzy" search finds all text strings that
approximate the original query). Once the text is found, the corresponding
image of the original document page can be brought to view; this page image
can also be printed if desired.

About 50 people, connected by network systems, will use this system. The
number of potential users could increase into the hundreds, some located
throughout the nation or internationally.

I have witnessed two products which, when demonstrated on small vendor-
site network systems, performed admirably; they accomplished all of the
objectives stated above (except the part, of course, about a large network
installation). Those products are...

..."PixTex/EFS" by Excalibur and
..."LaserFiche NLM Windows" by CompuLink.

Also, there are two more systems I can readily think of that we'd like to
see demonstrated:

..."Topic" or "Topic Realtime."


Have you seen, heard of, or used these--or similar--products in a large
network installation? If so, I would be very glad to hear your comments.
Some of my detailed questions to you would be...

o What products are you using (and on what hardware platforms)?
o Where there any combinations of products used to provide the full
document management capabilities you and your coworkers needed?
o How easy or practical is the system you use?
o Is it slow in operation or in printing?
o Can you always find the information you seek?
o Does the system let you use a little less paper than before?
o How difficult was it to install the system and make it operational?
o In general, are you happy or dissatisfied with the system?
(Are you or your coworkers eager or reluctant to use the system?)

I am fairly excited about the potential in using a DSR system, but I don't
want to see something installed that only provides more promise than actual

Thank you very much for the responses you provide sent either to this list
or to me privately.

| John C. Reynolds, III | Calspan/Space Ind. Internat., Inc. |
| P.O. Box 130873 | Johnson Space Center |
| Houston, TX 77219-0873 | Payload Safety (NS2) |
| (713) 861-3334 | Technical Writer |
| Society for Technical Communication (STC) - Houston Chapter |
| Email: reynoldsi_j -at- uhdvx3 -dot- dt -dot- uh -dot- edu (preferred) |
| Email: 70053 -dot- 2375 -at- compuserve -dot- com |
| Email: nclsjr -at- jscprofs -dot- nasa -dot- gov |
| Compuserve: 70053,2375 |

Previous by Author: Email petition to oppose "Clipper"
Next by Author: Re: Neologisms
Previous by Thread: Ratio of Tech Writers/Developers
Next by Thread: Re: Document scanning and retrieval system

What this post helpful? Share it with friends and colleagues:

Sponsored Ads