RE: Real value (was implementing single-source) (Long)

Subject: RE: Real value (was implementing single-source) (Long)
From: "Thomas Quine" <quinet -at- home -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Fri, 10 Nov 2000 08:02:38 -0800

Dan, I must thank you for taking the time to lay out your thoughts on this
important issue. I'm not snipping your original message, even though it's
rather long, because I think it's worth re-reading.
Here's a link to a good overview of XML document management systems:
- Thom

-----Original Message-----
From: bounce-techwr-l-20657 -at- lists -dot- raycomm -dot- com
[mailto:bounce-techwr-l-20657 -at- lists -dot- raycomm -dot- com] On Behalf Of Dan Emory
Sent: November 9, 2000 3:27 PM
Subject: Re: Real value (was implementing single-source) (Long)

Andrew's version of the "new economy" is apparently one that,
Luddite-like, ignores modern ways of improving productivity.
As the Scientific American aritcle I cited in my earlier
post pointed out:

"...many researchers (i.e., knowledge workers) are like sailors thirsting
to death surrounded by an ocean: what they need is all around them, but
it's not in a form they can readily use."


"The R&D effort is at a primitive craft scale, like cottage weavers
(aside: the original Luddites), although standardization is one of
the first problems that got tackled in the Industrial Revolution,
with the invention of interchangeable part."

What is needed is a way to create, interchange, manage, distribute,
retrieve, reuse, and repurpose information effectively for both human
and non-human (e.g., software applications) users.
Until that is accomplished technical writing will remain a
cottage industry, and the people who use what is
produced by TWs will continue to be "sailors thirsting
to death surrounded by an ocean of information" in proprietary
formats. The opposition to the change to XML is a Luddite
reaction by people who fear or misunderstand the
XML paradigm.


1. At present, most document management systems control and
access information at the file level. The documents themselves
are in a proprietary format, and thus are not amenable to
database storage. Consequently, these document management
systems traditionally have external pointers to the files
being managed. If, for instance, you want to modify just a paragraph
in a file, you must check out the entire file, make the change, and
check it back in. While you have it checked out, no one else
has write access. When it's checked back in, the resulting
revision tracking information is only at the file level.

2. When information is managed at the file level in such systems,
those who are searching for specific information can only
search for and retrieve information at the file level. And, when
information must be re-purposed (e.g., for different delivery
requirements) a configuration management nightmare is
almost inevitable, because the number of source file(s) either
multiply, or the source file itself grows to obscene complexity
(e.g., conditional text and similar devices) in order to satisfy
all the various delivery requirements. Each time the source
file is updated, all of its derivative deliverables must be
re-generated (and often are subsequently massaged and
changed independently of the source. Thus the source and all of
its derivatives have a validity based upon the time they were last
updated, and there is a strong likelihood that that the resulting
documents are in disagreement. This violates all the normal
criteria associated with effective configuration control, validation,
and verification.

3. These types of document management systems not only
confound the problem of configuration control and repurposing
of information, they also make it difficult to reuse that information.
Again, users can only access potential candidates for reuse
at the document file level. Suppose a knowledge worker is looking
for a particular paragraph, table, or illustration that might be
reusable in another document? In this case, all of the metadata
in the database is at the document file level, thus only candidate
files can be retrieved, and then pored through for the desired reusable
That is hardly an efficient process. Then, the knowledge worker
must extract the desired nugger and paste it into the new document.
Then, the pasted-in information may or may not be modified. Suppose
it is reused without modification, and there is a requirement that,
if the source is changed, the extracted nugget must also be
updated to reflect that change. Or, if the extracted reusable
nugget is modified, you want to correlate the modified version
with the original version so that you can determine whether
changes to the original version need to be reflected in the
modified version, or vice versa. You can't do these things if all document
management activities are at the file level. Once again, you've
created a configuration management nightmare.

So, to summarize:

a. Old-fashioned file-level document management systems
don't work well when repurposing (i.e., single-sourcing) and information
reuse are important factors.

b. Revision tracking is only possible at the document file level,
which is insufficient.

c. Check-out/check-in can only be accomplished at the document
file level, thus, when a document is checked out for editing by one
writer, it is unavailable for editing by other writers. Consequently
true collaborative authoring is impossible unless each writer is
assigned total responsibility for one or more individual document files.

d. Users of the information cannot effectively search for and retrieve data
below the document file level.

e. True configuration management is made exceptionally difficult.

f. Since the documents being managed all use proprietary formats,
and formatting is usually based on non-Unicode fonts that (at best)
are only applicable to a few languages, the ability to intermix two
or more languages in the same document file is problematic.
Also, the use of special notation (e.g., mathematical, chemical,
music) in such documents is problematic at best. because each
proprietary format uses different, non-standard ways of
representing those special characters. This greatly complicates
translation of the source document to other languages. That
usually necessitates the maintenance and control of separate
documents for each language, further complicating the configuration
control problem, because there is no viable way to assure that
all language versions are kept in sync.

Is there a cost associated with these limitations and inefficiencies?


1. Unlike file-based systems, one based on XML stores the
information itself in the database. Because all document
components are individually tagged, each such component
can be stored separately in the database. Each component
has metadata in the form of descriptive element names and/or
attributes which can be used for all sorts of purposes. In
particular, attributes that describe the information content
of each component add value to the information.

Consequently, in response to user queries, searches can be
conducted down to the individual component level,
using element names and attribute values to focus the search.
Information retrieval is further facilitated by metadata enrichment
at both the document level and the higher levels of structure within
documents through the use of RDFs (Reference Description
Frameworks) that can be customized for various disciplines.

2. The database itself is the sole source for all deliverables.
Consequently, true single-sourcing and repurposing
are possible, and XSL middleware is used to prepare the
requested information (both structure-wise and format-wise)
to fit the needs of the human or non-human user. All or any part
of a document can be extracted from the databas.
All proprietary formatting is eliminated. The extracted information
is guaranteed to be structurally valid and up-to-date as of the
moment it was delivered from the database.

3. Check-out/check-in can be performed at any level
of granularity, from an entire document down to an individual
component. Consequently, many authors can conduct
simultaneous, non-conflicting edits on the same document.
Equally important is the fact that revision-tracking can be
performed to any desired level of granularity, identifying
the author who originated or changed the information, the
date of check-in, and the reason for the change (e.g., an ECO).
All previous versions of the same component(s) can be
retained in the database.

4. Information reuse is greatly facilitated. Queries based on
RDFs, element names, and attribute values can retrieve information
at any level of granularity. If a component extracted
from the database is reused in another document, and the
source is subsequently modified, all documents that
use that component can be automatically updated.

5. XML uses Unicode to represent characters. Ideally
the proprietary software (e.g. FrameMaker+SGML)
used to author documents and export them to XML
would also use Unicode internally (currently, this is not
true for FM+SGML). A separate code point is provided
for each distinctive glyph in each language. A single
code point is used for glyphs that are common to two
or more languages (e.g., punctuation).

Since revision tracking down to the individual component
level is possible, All changes made in the original language
version can be flagged, assuring that the translated versions
can be updated to reflect all such changes. Any language
version can be retrieved from the database and
delivered, and the correct representation of each glyph
is assured. A number of such Unicode-compliant fonts,
some of which allow the same font to be used for up
to 40 different languages, are now available. Consequently,
the same font can be used in the authoring software for
both the original and translated versions.


The features of XML-based document management
systems described above for can produce vast improvements in
productivity and information validity. Admittedly, the
investment required to create such a system are quite high,
but over and over again it has been shown that the ROI
is also quite high. A large part of the initial investment
is in the conversion of unstructured legacy documents in
proprietary formats to structured ones with extensive
metadata. The converted documents are then exported to
the the non-proprietary XML format.

The costs of XML-based document management
systems is bound to drop as their use becomes more
pervasive. Also, the overhead cost of customizing such
off-the-shelf software products to meet specific needs is
likely to drop even more sharply as more companies get
on board. The near-certainty that XML is the future, combined
with the productivity and validity improvements described above,
should persuade most companies to cease creating and
archiving unstructured documents in proprietary formats, because
that will simply escalate the cost of the E-ticket ride into
the future. That means you should, as soon as possible,
begin originating all new documents in XML or SGML, because
the sooner you do, the lower the investment cost will be,
and the greater the ROI will be when you convert
to an XML-based document management system.

Andrew's Luddite tirade below against "infrastructure" and
the XML approach is the old economy way of looking at
things. The old way will survive only in small old-economy
companies who cannot (or won't) pay the cost of admission.

Andrew argues that technical writers are hired to write documents.
No they're not. They're hired to produce information, and
structured, tagged, information in a non-proprietary format
containing descriptive metadata is the modern way
(or, as Andrew would have it, the New Economy way)
to produce information in the most efficient,
manageable, useful, and economic way.

Develop HTML-based Help with Macromedia Dreamweaver! (STC Discount.)
**NEW DATE/LOCATION!** January 16-17, 2001, New York, NY. or 800-646-9989.

Sponsored by SOLUTIONS, Conferences and Seminars for Communicators
Publications Management Clinic, TECH*COMM 2001 Conference, and more or 800-448-4230

You are currently subscribed to techwr-l as: archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit for more resources and info.

Previous by Author: RE: caps usage
Next by Author: RE: Newbies
Previous by Thread: Re: Real value (was implementing single-source) (Long)
Next by Thread: Re: Real value (was implementing single-source) (Long)

What this post helpful? Share it with friends and colleagues:

Sponsored Ads

Sponsored Ads