TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
> I don't understand...there are many products, including many free
> ones, that will extract text and images from a PDF file. For a simple
> two-page data sheet, why not simply extract the text, edit it, and
> create a new, corrected pdf instead of going through all this grief?
> I doubt that should take more than fifteen minutes or so in most
> cases--or am I missing something?
Perhaps you are correct. I haven't tried that route in several
years, and maybe the tools of which you speak have improved.
Of course, the data-sheet has a full-height background image
(or probably an assembly of graphic pieces) that form a heading,
left margin (-ish) and bottom/footer area.
I remember that the extracted image bits used to be the wrong
size, and got fuzzy when re-sized to recreate the appearance
of the document in Word or FrameMaker.
In a later post, Dan G. mentions a product or two. My outdated(?)
experience was that if the PDF's text was in properly-flowing chunks
as seen/selectable in Acrobat, there was no problem.
The problems arose when the text was in artfully-placed clumps
and bunches that the PDF-creation insisted on grouping with
You'd start your selection at the beginning of what was visually
a contiguous block of text - say three or four paragraphs. You'd
drag down the page and find that (say) the first three lines of
the first paragraph had been selected, then nothing, then six
words in the third paragraph, and all but the last four words
of the fourth paragraph.
So, you'd try selecting one of the skipped sections and suddenly
about twenty independent, widely-scattered bits of text and a
few graphic elements would be selected... but not the major
chunk that you had originally been able to select.
Doing something like "select all" would have equally bizarre
consequences, but in a somehow different direction.
It was enough to make a body cry.
I could just envision the original EPS/PDF creation as a
little man inside the computer, dressed as a generic witch-doctor/shaman,
complete with scary mask and fetish objects, dancing, whirling,
chanting, and leaping wildly about. The sequence of objects and
text pieces was then accomplishes by wherever his left foot
landed as he cavorted, or wherever the "fairy dust" (ground
bones of criminal cadavers) settled. Yes.
OK, so maybe things have improved and I should revisit.
But, as somebody else pointed out (was that Gene?) I
should probably be expending my efforts toward convincing
people to not let these things happen.
I'll reply to Heidi's post as to why that's tougher than it
- KevinThe information contained in this electronic mail transmission
may be privileged and confidential, and therefore, protected
from disclosure. If you have received this communication in
error, please notify us immediately by replying to this
message and deleting it from your computer without copying
or disclosing it.
Are you looking for one documentation tool that does it all? Author,
build, test, and publish your Help files with just one easy-to-use tool.
Try the latest Doc-To-Help 2009 v3 risk-free for 30-days at: http://www.doctohelp.com/
Help & Manual 5: The all-in-one help authoring tool. True single- sourcing --
generate 8 different formats and as many different versions as you need
from just one project. Fast and intuitive. http://www.helpandmanual.com/
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-