Scanning Latin: best OCR software & scanner hardware?

Subject: Scanning Latin: best OCR software & scanner hardware?
From: Carl Stieren <carls -at- CYBERUS -dot- CA>
Date: Fri, 12 Feb 1999 09:34:50 -0500

Hello Colleagues,

I need some extremely good OCR software and a scanner to match. And the
setup has to be able to scan late Renaissance Latin. Believe it or not, this
book may even have some bearing on technical writing, because of the layout,
formats, lengths and designs of sections used in this Renaissance textbook.
Any suggestions for software or scanner hardware that would work with a 350
MHz Pentium II?

The typeface is surprisingly quite similar to Times Roman, with a few
exceptions: every "s" except at the end of a word written like an "f". The
"ae" character is also used, the upper-case "Q" has its magnificent
descender sprawling under the character to the right, and for some reason,
every "c" followed by a "t" has an ascender that reaches from the center of
the top of the "c" to the tip of the ascender of the "t". And there are
abbreviations: Arist. for Aristotle, and to my surprise, there was "ex.
gr.", which later became "e.g.", and at OmniMark at least, is now used as "eg".

Now to the book itself: I have in my lap a book printed 328-years ago, and
written 30 to 40 years before then by my ancestor, Johann Stier. It is a
collection of short books he wrote, probably in association with the
University of Erfurt in Germany. These books were collected and reprinted by
the printer Roger Daniel in Cambridge, England. (Daniel's operation later
became Cambridge University Press). The copy I purchased from antique book
store is the 7th edition, published in 1671 in London by "J. Redmanyne pro
J. Williams", who seem to have taken over the copyright for England, if
there was such a right then, from Roger Daniel.

The language of the book ... aye, there's the rub ... is Latin, and has
Greek words sprinkled in liberally, since Renaissance philosophers, Johann
Stier included, were always referring back to Aristotle. The book was used
as a university textbook for many years at Cambridge. The title is
"Praecepta Doctrinae Logicae, Ethicae, Physicae, Metaphysicae, Sphaericaeq;
Brevibus Tabellis compacta, una Cum Quaestionibus Physicae Controversis"

As a technical writer, it's the "Brevibus Tabellis compacta" and the
"Quaestionibus Physicae Controversis" that excite me the most. My Latin is
almost nonexistant - I'm taking lessons now. The "Brief compact tables" are
similar to outline notes of subjects, compactly printed, with varying length
indented sections and brackets with markers in the margins. (Must have been
a nightmare to typeset!). The "Quaestiones Controversae" start out a bit
disappointingly "An Physica sit scientia?" which I take to mean "Is physics
a science?" (The answer being argued for "Affirm." is "Yes".) The questions
and answers go on for 94 pages - it's the largest section in the book.

I scanned text and used OCR software seven years ago as a technical writer
when we didn't have soft copy of a document. It was on an Apple II, and the
OCR software was made by a little company in Florida. The recongition rate
was poor, the throughput rate discouraging.

I will certainly post a page of this book as a graphic on my web site in the
coming months, along with a translation of the page. There may be more my
site, but it all depends on what sort of a book I decide to write about this
book, its author, and his era, 1599 to 1648 (30 Years' War in Germany and
English Revolution in England).

Anyone have any ideas, either for the software/hardware or other issues?

- Carl Stieren

Carl Stieren carls -at- cyberus -dot- ca
Technical Writer and Designer......................deep in Silicon Tundra
Ottawa, Ontario, CANADA.........................1 hr 40 min from Montreal
Carl's "Text and Subtext" Web

