RE: Converting Word files into XML

Subject: RE: Converting Word files into XML
From: Stuart Burnfield <slb -at- westnet -dot- com -dot- au>
To: techwr-l -at- lists -dot- techwr-l -dot- com
Date: Sat, 31 May 2008 00:04:33 +0800

> ... take GIANT Word file (they could range from 50-1200 pages)
> and convert them into XML (not DITA, but some other DTD). I
> just got a tour of what they do, and they literally go through
> line-by-line in Dreamweaver, assigning tags as they go.

> Has no one yet developed an amazing little tool that could map
> Word styles into XML tags to provide clean output?

It's harder than it looks. The problem is that Word docs are
unstructured, so it's not just a case of working through the doc line by
line and mapping each para style and character style to a corresponding
XML tag. The XML tags need to be nested correctly, and that nesting
information is mostly absent from the source document. If you could
guarantee that the Word document was formatted in a very consistent,
rigorous way, it would be a lot easier, but how often do you see a Word
doc like that?

> There's gotta be a faster way to do this, isn't there?

Yes. I've done a few conversions from Frame and Word to SGML (different
tags, same process). Frame's Save As XML maps preserves the name of the
Frame style--e.g. Bulleted1 para style becomes <Bulleted1> in the XML
file. You can then open the file in a text editor and use find/replace
to map these tag names to the valid XML tags. If you're handy with
regular expressions and macros you can automate a lot of the mapping.

Write if you want to know more about this.

This company does Word to *ML conversions. I haven't used them but they
will do a sample conversion so you can try before you buy:

"The Legacy Data Conversion Center can convert your Word Legacy Data to
any standard output format quickly and inexpensively. Our standard
conversion prices include converstion to DocBook, DITA, MIL-STD-38784,
and S1000D.

The Legacy Data Conversion Center supports conversion of Word legacy
data to any custom document type. For a small fee, we can convert your
legacy data to any proprietary or custom output format..."


Create HTML or Microsoft Word content and convert to Help file formats or
printed documentation. Features include support for Windows Vista & 2007
Microsoft Office, team authoring, plus more.

True single source, conditional content, PDF export, modular help.
Help & Manual is the most powerful authoring tool for technical
documentation. Boost your productivity!

You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-unsubscribe -at- lists -dot- techwr-l -dot- com
or visit

To subscribe, send a blank email to techwr-l-join -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit for more resources and info.

Previous by Author: Re: Is Vista "there" yet? And Which computer should I buy...
Next by Author: UCSD copyediting courses/cert
Previous by Thread: Re: Converting Word files into XML
Next by Thread: Keep XP fresh till Windows 7 (Was: Is Vista "there" yet?)

What this post helpful? Share it with friends and colleagues:

Sponsored Ads