TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:RE: PDF saved to gibberish From:<Brian -dot- Henderson -at- mitchell1 -dot- com> To:<TECHWR-L -at- lists -dot- techwr-l -dot- com> Date:Mon, 14 Jun 2010 08:18:50 -0700
I am constantly extracting text from PDFs, and I never been able to
figure out why in some docs the text is ASCII and in some it's binary
code. I rarely have knowledge of how most of the documents were created
so I haven't spent much effort trying to figure it out.
Usually, I just OCR the doc and that solves the problem.
-----Original Message----- From: Nancy Allison
I have tried two ways to save the text of a PDF to .txt and both
attempts produced a weird, symbol-font type gibberish.
This is what it looks like once it's pasted into Plain Text: .
In the .txt file, it shows lots of male and female symbols, exclamation
points, musical notes, and geometric figures.
I used the Acrobat Save as Text command, and also selected all the text
and pasted it into a .txt file. Same result both times.
I selected the gibberish and assigned different fonts to it; the
gibberish showed up in the selected fonts. It seems as the text has been
assigned to a different character set.
The PDF document properties show a Security Method of "No Security, "
Document Assembly, Comenting, Signing, and Creation of Template Pages
are Not Allowed.
Everything else, including Content Copying, is allowed.
Any ideas as to what's going on, and how I can successfully extract the
Gain access to everything you need to create and publish documentation,
manuals, and other information through multiple channels. Choose
authoring (and import) as well as virtually any output you may need. http://www.doctohelp.com/
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-