Followup: Moving Word documents into HTML?

Subject: Followup: Moving Word documents into HTML?
From: "Hart, Geoff" <Geoff-H -at- MTL -dot- FERIC -dot- CA>
To: "Techwr-L (E-mail)" <TECHWR-L -at- lists -dot- raycomm -dot- com>
Date: Thu, 4 May 2000 12:50:09 -0400

A followup on my earlier posting concerning moving Word documents into HTML.
This info. is taken from the current issue of Woody's Office Watch, a "do
not live without it" resource for Office users. Subscription information
available at the following address:
Join, Leave or change address: http://www.woodyswatch.com/
Email: send the message "subscribe" to wow -at- wopr -dot- com

--Geoff Hart, FERIC, Pointe-Claire, Quebec
geoff-h -at- mtl -dot- feric -dot- ca

3. WHEN THERE'S TOO MUCH HTML IN WORD
Word 2000 is promoted as having full HTML compatibility but
there are times when full HTML fidelity is a pain in the
neck.

When you copy some text from Word to an HTML editor like
Frontpage, Word does its best to send with it exactly what
you had in the document. The same fonts, spacing - the
works.

Sometimes that's what you want - but often you just want
the basic formatting (bold, italics etc) plus the raw text.
What Word sends with the copied text is a bloated set of
class settings, span statements and XML tag placements.

Try copying a few paragraphs from Word to Frontpage then
look in the HTML view - you'll see things like:

"<span style="mso-spacerun: yes">&nbsp; </span>" - this is
just for two or more spaces!

'class="MsoBodyText"' - this is the name of the original
Word style with 'Mso' in the front of it.

"<o:p> </o:p>" -- apparently these are XML placeholders.
They are everywhere in Word copied text but serve no useful
purpose in most cases.

'<span style="font-size:9.0pt;mso-bidi-font-family:
'Courier New' color:black">' - span statements like this
bring across the exact character formatting in Word.

All this extra code can severely bloat the size of a HTML
page, making it slower to download and display. Worse it
can confuse the page editor if they don't realize the tags
are there.

Sadly there's no direct solution - Microsoft had its focus
so directly at HTML compatibility that it ignored the more
practical possibilities. Thankfully you can identify most
of the surplus Word stuff because of the liberal use of
'Mso' in the tags.

One solution is to use Paste Special and choose the 'Normal
Paragraphs' option. That will remove all formatting codes,
including fundamental ones like bold and italics etc. The
same happens if you select text in FrontPage and choose
Format | Remove Formatting; all the formatting is lost.

You can go through manually and remove the excess. Because
Frontpage has such a basic replace function (compared to
Word) that's a tedious process.

The only direct solution we've found is expensive but
effective. Dreamweaver 3 from Macromedia
http://www.amazon.com/exec/obidos/ASIN/B000040P1K/woodsoffiwatcwoo
has a feature specifically designed to remove the HTML
crud.




Previous by Author: Re. Quicken help effectiveness study?
Next by Author: RE. Documenting a mess?
Previous by Thread: Summary of French Translation resources
Next by Thread: Re: Followup: Moving Word documents into HTML?


What this post helpful? Share it with friends and colleagues:


Sponsored Ads