Re: Cheicken and eggs scenario for structred writing

Subject: Re: Cheicken and eggs scenario for structred writing
From: Stuart Burnfield <slb -at- westnet -dot- com -dot- au>
To: Yves Barbion <yves -dot- barbion -at- gmail -dot- com>
Date: Tue, 30 Jul 2013 14:42:53 +0800 (WST)

Thanks very much for the follow-up, Yves. My experience with converting documents from Word and unstructured FrameMaker is a few years old and clearly there are much better tools available now.

Adding structure information: Aside from metadata, I'm pretty surprised that automated tools can do much to create typed topics from unstructured Word, unless the source documents are very carefully formatted. One of my projects is a set of manuals converted to DITA from SGML (IBMIDDoc), which I assume is a much cleaner starting point than Word. We found that reference and concept topics came across very easily as long as the original material was not too mixed. However, because the structure rules for task topics are more restrictive, these required a lot of manual rework.

Even cleanly converted content required manual effort. For example, topic introductions came across as perfectly valid <p> tags, but every one had to be checked and often rewritten to produce a suitable <shortdesc>.

So while it might not be too hard to get to the point where the new DITA topics don't throw validation errors, we found there was a lot more work making those topics well-structured.

Importing tables: Do you mean that these editors can Smart Paste copied Word tables directly into DITA? One of my colleagues wrote some scripts to generate complex tables as an Excel workbook. We use Paste Excel in Arbortext to convert these and tables taken from Word-formatted design documents to DITA tables. I don't think Arbortext lets us paste Word tables directly.


----- Original Message -----
From: "Yves Barbion" <yves -dot- barbion -at- gmail -dot- com>
To: "Stuart Burnfield" <slb -at- westnet -dot- com -dot- au>
Cc: "Techwr-l" <techwr-l -at- lists -dot- techwr-l -dot- com>
Sent: Friday, 26 July, 2013 3:14:52 PM GMT +08:00 Beijing / Chongqing / Hong Kong / Urumqi
Subject: Re: Cheicken and eggs scenario for structred writing

On Fri, Jul 26, 2013 at 4:14 AM, Stuart Burnfield < slb -at- westnet -dot- com -dot- au > wrote:


Start experimenting with a conversion process. It would be a mountain of work to manually paste your Word content into skeleton DITA topics. There won't be a simple single-step 'Save As DITA' but you should be able to at least semi-automate the process. For example:
- Save a Word document as XML. This creates a text file with Word formatting mapped to generic tags. Use scripts, macros or regular expressions to convert all generic XML tag s <foo> to actual DITA tag <bar> and to delete unwanted MS Word fluff.

[Yves] >>> Instead of saving the Word file as (Microsoft) XML, you could also try Eliot Kimber's Word-to-DITA transformation Framework:

It does require a bit of setup, but you can then actually "save" your Word files as DITA. You can download the framework from :

Another option is to:

1. Open the Word files in FrameMaker.

2. Use mif2go ( ) to set up your conversion (basically a style to element mapping), for example:

â heading* = title
â bulleted list = ul li
â instruction = cmd
â heading 4 = section

3. Save the FM file as DITA.

- Use MS Excel as an intermediate step to convert large tables to DITA.

[Yves] >>> That's not really required. Some DITA editors, such as oXygen XML Author and FrameMaker 11, have a "Smart Paste" function, which handle tables very nicely, even more complex tables with merged cells.

As Chris says, you can't magically add structure information that isn't there in the original, but you can automate a lot of the grunt work.

[Yves] >>> Well, actually, you can, depending on the type of structure information which needs to be added. Metadata is very important if you use DITA. In a topic, you can add metadata in the <prolog> element, which can contain things like the name of the author, the creation and modification date of the topic, the product name, the version of the topic etc. MIF2Go can add this prolog to each topic during the conversion.

Still, you will have to check your content after the conversion and restructure it (a bit, YMMV). For example, suppose you have this paragraph in the original text:

"Click on Preview. You should see that this output creates a PDF of the 3D view only."

After conversion, you will get this:

<step><cmd>Click on Preview. You should see that this output creates a PDF of the 3D view only.</cmd></step>

This is *valid* DITA, but not well-structured yet. You need to restructure and refactor this to get something like this:

<step><cmd>Click <uicontrol>Preview</uicontrol>.</cmd>
<stepresult>You should see that this output creates a PDF of the 3D view only.</stepresult>

Conversion can be automated; restructuring/refactoring cannot because you actually have to read the text and then decide that, in this case, the second sentence is the result of the instruction in the first sentence. In other cases, however, the second sentence may be an example (stepxmp), a tip (note type="tip"), or just some more information about the instruction (info).


Yves Barbion

New! Doc-to-Help 2013 features the industry's first HTML5 editor for authoring.

Learn more:


You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-leave -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit for more resources and info.

Looking for articles on Technical Communications? Head over to our online magazine at

Looking for the archived Techwr-l email discussions? Search our public email archives @


Previous by Author: Re: Cheicken and eggs scenario for structred writing
Next by Author: RE: What to call this thing?
Previous by Thread: Re: Cheicken and eggs scenario for structred writing
Next by Thread: Re: Cheicken and eggs scenario for structred writing

What this post helpful? Share it with friends and colleagues:

Sponsored Ads