RE: Exporting text from recalcitrant PDFs

Subject: RE: Exporting text from recalcitrant PDFs
From: "Combs, Richard" <richard -dot- combs -at- Polycom -dot- com>
To: "Cardimon, Craig" <ccardimon -at- M-S-G -dot- com>, "Techwr-l" <techwr-l -at- lists -dot- techwr-l -dot- com>
Date: Fri, 17 Aug 2007 12:07:08 -0600

Cardimon, Craig wrote:

> The PDF I dealt with was not locked, but it was still uncooperative.

You'll have to be a bit more specific. What was "uncooperative" --
Acrobat? Adobe Reader? Something else? What version? How was it

If you were using Acrobat, did you try all the text-based Save As
formats (doc, rtf, txt)? Did you try the Select tool and copying the

> I resorted to a feature called "OCR Text Recognition," which
> allowed me to proceed, but the going was kind of brutal.

Oh, I see. Your PDF didn't _contain_ text! It contained _images_ of
text. It was created by scanning hardcopy pages.

OCR (optical character recognition) is the only option for converting a
bitmap image into editable text. It isn't perfect.

I assume the people who want this "data extracted" aren't ripping off
someone else's docs, so they should have the source files from which the
scanned pages were created. If those have been lost, explain to them
that imperfect OCR is the best you can do, and they need to be more
careful with their intellectual property in the future. :-)


Richard G. Combs
Senior Technical Writer
Polycom, Inc.
richardDOTcombs AT polycomDOTcom
rgcombs AT gmailDOTcom


Create HTML or Microsoft Word content and convert to Help file formats or
printed documentation. Features include support for Windows Vista & 2007
Microsoft Office, team authoring, plus more.

True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity!

You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-unsubscribe -at- lists -dot- techwr-l -dot- com
or visit

To subscribe, send a blank email to techwr-l-join -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit for more resources and info.

Exporting text from recalcitrant PDFs: From: Cardimon, Craig

Previous by Author: RE: poor/misleading signage
Next by Author: RE: Wikis
Previous by Thread: Exporting text from recalcitrant PDFs
Next by Thread: GNU Free Documentation License (GFDL)

What this post helpful? Share it with friends and colleagues:

Sponsored Ads