Re: [FrameSGML] Structured Document Design for XML or SGML (Long)

Subject: Re: [FrameSGML] Structured Document Design for XML or SGML (Long)
From: Dan Emory <danemory -at- primenet -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Mon, 22 May 2000 15:52:44 -0700

At 02:33 PM 5/22/00 -0400, you wrote:

-------------------------Snip--------------
Strictly-structured
-------------------
<command>
<name>foobar</name>
<syntax>...</syntax>
<description>...</description>
<parameters>...</parameters>
<example>...</example>
<success_resp>...</success_resp>
<error_resp>...</error_resp>
<command>

Less-strict structuring
-----------------------
<command>
<name>foobar</name>
<subsection><name>Syntax</name>...</subsection>
<subsection><name>Description</name>...</subsection>
<subsection><name>Parameters</name>...</subsection>
<subsection><name>Example Command</name>...</subsection>
<subsection><name>Successful Response</name>...</subsection>
<subsection><name>Error Response</name>...</subsection>
</command>

=======================================
I would always use the strictly-structured one, provided the content model
makes some of the components optional (e.g., error_resp, success_resp).

Don't get me wrong. I'm wholeheartedly in favor of element names that
describe content as precisely as possible when it fits the situation.

What I would call the "molecular" structure you describe
above pertains to a particular kind of information type having "atomic" components
that can also be named to indicate their information sub-type.

Where I depart from the purists is that this is often not the case at the
molecular level. Instead, the "molecules" are simply wrappers for collections of "ordinary"
atoms (e.g., paragraphs, interspersed with lists, graphics, tables, etc.) that have no
classifiable information type. However the document object type of each
such "atom" is always classifiable (e.g., Para, List, Item (within a list),
Graphic, figure caption, Table), and these "atoms" should
have names that describe there objectness, not their unclassifiable
information type.

So, some "molecules" (and perhaps their "atoms" should have names
that convey their information type, while other "molecules" and their
"atoms" should have names that describe their objectness.

It is at the level of superstructure above the molecular level where content
becomes important.



What Dan proposes in his "Information Design" paper, to me, seems
to suggest using attributes to produce a hybrid structure like so:

<reference type="command">
<name>foobar</name>
<subsection type="syntax">...</subsection>
<subsection type="description">...</subsection>
<subsection type="parameters">...</subsection>
<subsection type="example">...</subsection>
<subsection type="success_resp">...</subsection>
<subsection type="error_resp">...</subsection>
</reference>
====================================================
No, that's not what I propose. What I say is use content-oriented names
when it fits, and use object-oriented (or even format-oriented) names when
that fits.

When the SGML standard was developed, the idea of separating content
from format was simply intended to make SGML docs independent of any
proprietary DTP or WP software used to produce/display/print them.

Certainly the SGML standard does not dictate any kind of element naming
conventions (other than length and permitted characters), nor does it limit
in any way how attributes should be used. The purists have tried to
impose another layer on top of the standard that requires element names
to always convey content and content only, and forbids the inclusion of
formatting information in any form whether it be in element names or
attributes. They claim that structure must consist solely of a hierarchy
of content-named elements, and that element context is always sufficient
to describe formatting.

If formatting attributes are forbidden, then any style sheet (or EDD) for formatting
SGML document instances must rely solely on element context. That
means the original developer of the DTD has predetermined for all time
what formatting variations are possible, because there are no
author-specified "hooks" that can be used by the style sheet to reflect
the author's vision by transcending context when the author thinks
there is a need to do so.

And, if element names are the only allowable way to indicate information
content, then information content must be solely determined by context.
But I argue that information content has many facets.

In your strictly-structured example above, The primary information content
facet is Command, and the name of each child element of Command
describes an information sub-type within a Command. This is all well and
good as far as it goes. But I suggest the following additional facets
of information content could (and perhaps should) be represented by means of
attributes:

* RequirementTrace - Traces the command back to the particular paragraph
in the software requirements specification where the requirement for the
command originated.

*FuntionName - The name of the function in which the code module (where
the command is executed) is located.

* CodeModule - Identifies the module of code where the command is executed.

* CodeVersion - Identifies the version of the code at the time the command was
documented.

* ECOs - Identifies any Engineering Change Orders that have affected the content
of the Command element and its children.

* Rationale - Explains the rationale for the command (this may be important during
document reviews and other activities to inform people who might otherwise
be in the dark, particularly when the explanation of the command does not
immediately precede the Command element).

* Keywords - Lists any keywords, such that, if a user executes a search for
a particular keyword, a hit will be produced. Consequently, a hit is produced,
even if that word does not actually appear in the text itself. By elevating
the listed keywords in this way, the typical problem with keyword searches
(i.e., too many hits, most of which are inconsequential) is ameliorated.

I could think of other facets, but I think my point is made. All of the above
attributes, in my opinion, are describing additional facets of the command
information type.

The whole purpose of information facets (element names and their context
being one facet, and amplifying attributes being another) is to facilitate
user searches. Most search engines, however, cannot search for an
element name within a particular context (i.e., some chain of antecedent
parents of the Command element), thus, if it is only possible to search on
an element name, every Command element in the entire document will be
found. But with amplifying attributes, I can search, say, for all Command
elements within the function with name XYZ, or, I can search on all
elements which reference a particular paragraph in the Software
Requirements Specification, and so on.

You can see, therefore, that multiple information facets offer much more
powerful search capabilities, thereby facilitating information reuse and
repurposing. Additionally, when design changes occur, these attributes
can facilitate document revision activities by locating all elements
within a document that might be affected by an Engineering Change Order,
or by a new version of the associated function or code module.






====================
| Nullius in Verba |
====================
Dan Emory, Dan Emory & Associates
FrameMaker/FrameMaker+SGML Document Design & Database Publishing
Voice/Fax: 949-722-8971 E-Mail: danemory -at- primenet -dot- com
10044 Adams Ave. #208, Huntington Beach, CA 92646
---Subscribe to the "Free Framers" list by sending a message to
majordomo -at- omsys -dot- com with "subscribe framers" (no quotes) in the body.






Previous by Author: Re: Structured Document Design for XML or SGML
Next by Author: Re: O Adobe, Adobe! What a heartbreak company you are!
Previous by Thread: RE: employee or contractor?
Next by Thread: Re: O Adobe, Adobe! What a heartbreak company you are!


What this post helpful? Share it with friends and colleagues:


Sponsored Ads