Re: XML & Technical Writers...(struggling)

Subject: Re: XML & Technical Writers...(struggling)
From: Simon North <north -at- SYNOPSYS -dot- COM>
Date: Fri, 17 Apr 1998 10:02:32 +0001

Here you go,

> 1. Create a DTD containing some rules defined by XML.

This is a simple XML DTD (with all credit to David Megginson
of Microstar <dmeggins -at- microstar -dot- com>) for a poem, with my comments
added as explanation.

For those who know SGML already it will be slightly familiar but,
since XML is a subset of SGML, the allowable syntax is much smaller
than SGML.

-------------------- cut here -------------------------------

<?xml encoding="UTF-8"?>

<!ENTITY % inline "#PCDATA|emphasis">

<!-- this is a so-called parameter entity. think of it as being a
macro. a shortcut for a piece of markup is defined and can be used in
the DTD wherever it's required -->

<!ELEMENT poem (front, body)>

<!-- a poem must consist of a front element followed by a body
element -->

<!ELEMENT front (title, author, revision-history)>
<!ELEMENT title (%inline;)*>

<!-- the parameter entity would be expanded to say that a title
element must consist of PCDATA followed by an emphasis element. This
"content model" isn't that easy I'm afraid, and I'll have to explain
this a little better. Please bear with me.

SGML ignores 'white space' (spaces, tabs and so on) between markup,
HTML does too. In some cases you might want to keep the extra spaces
and tabs, so XML allows to say that you want it kept. However, XML
then has the problem of deciding what is 'significant' white space
and what isn't. PCDATA stands for 'parseable character data', in
plain terms, character strings (text) that can contain markup
(elements). Suppose I now write an XML document, using this content
model that contains:

<title> <emphasis>bold steps in the unknown</emphasis> are not to
be taken lightly.</title>

are the spaces between the start of the title element content and the
start of the emphasis element significant or not? Well, let me put
it this way. XML processing software (parsers, processors and so on)
are supposed to be simple to create (read cheap and fast). By
implication they cannot be very 'clever' and so the XML markup
(unlike SGML) must be quite simple and very explicit. This kind of
content model is called 'mixed content' (it can contain markup as
well as characters) and an element that will contain mixed content
MUST be specified as "(#PCDATA,element1, element2,elementn)*"
where the asterisk means repeat the group enclose in () any number of
times you like.

Parentheses ( ) surround a sequence or a set of alternatives.

The , character precedes each element type, except the first, in
a sequence.

The | character precedes each element type, except
the first, in a list of alternatives.

The ? character follows an element or group of elements, and
indicates that it occurs zero or one time.

The * character follows an element or group of elements, and
indicates that it occurs zero or more times.

The + character follows an element or group of elements, and
indicates that it occurs one or more times -->

<!ELEMENT author (%inline;)*>

<!ELEMENT revision-history (item+)>

<!-- a revision-history element therefore contains at least one item
and can contain as many as you like-->

<!ELEMENT item (%inline;)*>
<!ELEMENT body (stanza|line)+>

<!-- a body element consists of at least one stanza or one line,
followed by any number of stanzas or lines, the + means repeat the
whole group one or more times -->

<!ELEMENT stanza (line)+>

<!ELEMENT line (%inline;)*>
<!ATTLIST line
n CDATA #IMPLIED>

<!-- a line element has a line attribute that consists of character
data. The value is optional (since it is implied, the application
software can generate a value itself)

<!ELEMENT emphasis (%inline;)*>

----------------- cut here ------------------

So here's an XML document that is valid to this DTD:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE poem PUBLIC "-//Megginson//DTD Simple Poem//EN"
"poem.dtd">
<poem>
<front>
<title>Unknown</title>
<author>Anonymous, from the Eugenics Review, 1929</author>
<revision-history>
<item>1998-4-08: XML markup added</item>
</revision-history>
</front>
<body>
<stanza>
<line n="1">See the happy moron!</line>
<line n="2">He doesn't give a damn.</line>
<line n="3">I wish I were a moron,</line>
<line n="4">My God! perhaps I am!</line>
</stanza>
</body>
</poem>

------------------ cut here -------------------

I hope this starts to make sense ...

Simon.




Previous by Author: Re: XML & Technical Writers...(struggling)
Next by Author: XML Workshop Slides Online
Previous by Thread: Re: XML & Technical Writers...(struggling)
Next by Thread: Re: XML & Technical Writers...(struggling)


What this post helpful? Share it with friends and colleagues:


Sponsored Ads