TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Convert Word files to XML? From:Richard Hamilton <dick -at- rlhamilton -dot- net> To:"techwr-l -at- lists -dot- techwr-l -dot- com TECHWR-L" <techwr-l -at- lists -dot- techwr-l -dot- com> Date:Fri, 25 Jul 2014 11:21:08 -0700
There are several factors. The most important are: what XML schema you are converting to, how clean your Word content is, and how much content you need to convert.
Bottom line for me is that if you have a lot of content to convert, you should seriously consider contracting the job out to a conversion company, unless you have some serious expertise with XSL and related tools.
Here is some detail on some tools to consider if you want to go it alone:
I convert Word to DocBook XML using Open Office, which will export DocBook directly. However, sometimes it's better to export HTML and then use a utility called Herold to convert to DocBook. And, I've also used the rather circuitous route of uploading Word to a Confluence wiki, then exporting DocBook using a plug-in exporter developed by a company called k15t software. Which I use in a given case depends on what the input looks like.
You can convert Word to DITA using DITA for Publishers (dita4publishers.sourceforge.net). I haven't used it myself, but I know the developer (Eliot Kimber), and he does quality work, so I'd definitely give it a try if you're headed towards DITA.
One caveat is that I've found it exceedingly rare that a conversion will be completely clean. You need to plan on doing some kind of cleanup using an XSL stylesheet, perl, manual editing, or a combination of all three on the output of any of these tools unless your input is really simple and well suited to the tool you use (which, with Word, I've never seen:-).
XML for Technical Communicators http://xmlpress.net
hamilton -at- xmlpress -dot- net
On Jul 25, 2014, at 10:46 AM, Janoff, Steven wrote:
> For those with experience converting Word files to XML:
> What's the easiest or most effective way you've found to do this?
> Does it depend on the XML editor you're importing into?
> Arbortext is currently editor of choice, but I might also have the opportunity to install Oxygen at home.
> Thanks for your advice. I'll be researching on the web also, but that looks like a bit of a mish-mash.
> Read about how Georgia System Operation Corporation improved teamwork, communication, and efficiency using Doc-To-Help | http://bit.ly/1lRPd2l
> You are currently subscribed to TECHWR-L as dick -at- rlhamilton -dot- net -dot-
> To unsubscribe send a blank email to
> techwr-l-leave -at- lists -dot- techwr-l -dot- com
> Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
>http://www.techwhirl.com/email-discussion-groups/ for more resources and info.
> Looking for articles on Technical Communications? Head over to our online magazine at http://techwhirl.com
> Looking for the archived Techwr-l email discussions? Search our public email archives @ http://techwr-l.com/archives
Read about how Georgia System Operation Corporation improved teamwork, communication, and efficiency using Doc-To-Help | http://bit.ly/1lRPd2l