HTML and SGML

Subject: HTML and SGML
From: Chet Ensign <DOCCOE -at- IBIVM -dot- IBMMAIL -dot- COM>
Date: Mon, 2 May 1994 14:41:25 EDT

The following posting from Avalanche Development Company's SGML Newswire
compares HTML and SGML. Info on subscribing to the Newswire is included.

<TEXT OF POSTING>
Date: Mon, 2 May 1994 11:21:17 -0600
Message-Id: <199405021721 -dot- AA07350 -at- zippy>
To: sgmlinfo -at- avalanche -dot- com
From: sgmlinfo -at- avalanche -dot- com
Subject: SGMLNW: Definitions of HTML & SGML

084.1994-05-02

***************************************************************
****************** WELCOME TO SGML NEWSWIRE *******************
***************************************************************
* *
* To subscribe, send mail to sgmlinfo -at- avalanche -dot- com *
* *
* (Please pass along to interested colleagues) *
* *
***************************************************************

SGML & HTML: WHAT'S THE DIFFERENCE?
===================================

Following are two interesting definitions of the relationship
between HTML and SGML. They were posted to the comp.text.sgml
newsgroup in February 1994.

Paragraphs denoted with "*" in the left margin indicate
questions posted to comp.text.sgml. Paragraphs with empty
margins are the responses.


Response from: connolly -at- ulua -dot- hal -dot- com (Dan Connolly)


* I would like some clarification of the relationship between
* SGML, HTML, DTD, and the ISO standards. Here is my
* interpretation. Please correct me where necessary.

As I wrote the original draft of the HTML spec with the SGML
standard in my lap, I'll be glad to comment.

* SGML (Standard Generalized Markup Language) is an ISO
* standard (ISO 8879:1986). It is a system for defining
* documents and the markup languages that represent those
* document types.

You got it. The official way to Cite SGML is:

ISO 8879:1986, Information ProcessingText and Office
Systems -- Standard Generalized Markup Language (SGML)

* A DTD (Document Type Definition??) is a specific set of
* SGML semantics used to specify a document type and the
* markup language for representing that document.

* HTML is an example of a SGML DTD. Other examples are??

Here's the way I phrased it when I wrote the spec:

From http://info.cern.ch/hypertext/WWW/MarkUp/Intro.html
(which is part of:
ftp://ds.internic.net/internet-drafts/draft-ietf-iiir-html-01.txt):

The HyperText Markup Language is defined in terms of the ISO
Standard Generalized Markup Language [SGML]. SGML is a system
for defining structured document types and markup languages to
represent instances of those document types.

Every SGML document has three parts:

o An SGML declaration, which binds SGML processing quantities
and syntax token names to specific values. For example, the
SGML declaration in the HTML DTD specifies that the string
that opens a tag is </ and the maximum length of a name is 40
characters.

o A prologue including one or more document type declarations,
which specifiy the element types, element relationships and
attributes, and references that can be represented by
markup. The HTML DTD specifies, for example, that the HEAD
element contains at most one TITLE element.

o An instance, which contains the data and markup of the
document.

We use the term HTML to mean both the document type and the
markup language for representing instances of that document
type.

All HTML documents share the same SGML declaration and prologue.
Hence implementations of the WorldWide Web generally only
transmit and store the instance part of an HTML document. To
construct an SGML document entity for processing by an SGML
parser, it is necessary to prefix the text from ``HTML DTD'' on
page 10 to the HTML instance.

Conversely, to implement an HTML parser, one need only implement
those parts of an SGML parser that are needed to parse an
instance after parsing the HTML DTD.


* I have some other questions:

* Can I effectively say that HTML is ISO-compliant or
* ISO-compatible?

I'm not sure how those terms are defined. These terms _are_
defined by the SGML standard:

Conforming SGML Document: An HTML document (that is, when you
take an html file and prepend the HTML.DTD text, and change all
the unix newlines to SGML RE's) is a Conforming SGML Document
(as per section 15.1 of the SGML standard). (Well... they're
supposed to be anyway... there are a lot of html files out
there that wouldn't parse correctly relative to any published
DTD).

Minimal SGML Document: An HTML document (as above) is also a
Minimal SGML document (meaning it doesn't take a very powerful
SGML parser to parse it.) [There may be a few corners of the
HTML declaration that aren't quite minimal -- I'm not 100% sure
at the moment, since we tweaked LITLEN and NAMELEN a little]

Conforming SGML Application: WWW is _not_ a conforming SGML
application. For one, you have to document the fact that you are
one in order to be one, and nobody's done that. For two, most
WWW implementations allow all kinds of crap in HTML documents,
and section 15.2.2 says "A conforming SGML application shall
require its documents to be conforming SGML documents... ."

Conforming SGML System: WWW is _not_ a conforming SGML system --
you have to support arbitrary DTD's for this part.

* When creating an SGML document, is the DTD the filter or
* translator that defines the resulting output?

Nope. The only output specified by the SGML standard is "Yes. It
is a valid document instance" or "No. It is not a valid document
instance." There's also this sort-of-formal ESIS (Entity
Structure Information Set or some such) that an SGML parser
magically exposes to an application. The question of how to
translate the ESIS to postscript, for example, is not specified
by the DTD or any other SGML entity (you'd have to look into
DSSSSSSSSL or FOSI or something).

There are some tools for converting HTML to LaTeX for
printing I believe... though I am not familiar with their
completeness/quality.

* What commercial programs exist to create SGML documents?

Try the comp.text.sgml FAQ or some such... this isn't my
area of expertise.

* What is the status/relationship of HTML+ to this?

* What documentation exists about HTML other than what is on
* info.cern.ch?

Try this: http://www-external.hal.com/~connolly/html-design.html
It's my notebook on the design of a successor for HTML. It's got
pointers to all sorts of tutorials, discussion, and related
specs.

***********************************************************

Response from: (source unknown)


* I have a fuzzy inkling that there's a relationship between
* SGML and HTML, can someone clarify? If they're somehow
* linked, can I simply use Mosaic to view Sgml's?

HTML is only ONE special case of SGML. SGML is a much more
general mechanism. You should be able to configure any good SGML
viewer/editor to present you HTML documents in almost any way
you like. Mosaic is a highly specialized HTML viewer which
understands only one single SGML document type (i.e. HTML), not
SGML documents in general. In general, you can use SGML tools
to work with HTML documents, but not HTML-only tools (like the
various WWW browsers) for SGML processing in general. In SGML,
every type of document is defined by a document type definition
(DTD). If you are a computer scientist: a DTD is just a
grammar. The DTD lists the elements (e.g. <H1>, ...) which may
appear in a document of this type and in which order they may
appear. As HTML is a SGML DTD, there is also a formal (for
generic SGML tools parsable) DTD available in

<http://info.cern.ch/hypertext/WWW/MarkUp/HTML.dtd.html>.

In the WWW project, currently another SGML document type is
being defined (called HTMLplus). You'll find it at

<http://info.cern.ch/hypertext/WWW/MarkUp/htmlplusdtd.txt>



**************************************************************
* SGML NEWSWIRE LIST MANAGER *
* *
* Linda Turner *
* Corporate Communications *
* Avalanche *
* 947 Walnut Street *
* Boulder, CO 80302 *
* sgmlinfo -at- avalanche -dot- com *
* linda -at- avalanche -dot- com *
* Vox: (303) 449-5032 *
* Fax: (303) 449-3246 *
**************************************************************

From: lewisg -at- woods -dot- uml -dot- edu
Newsgroups: comp.infosystems.www
Subject: What is WWW ? (just out of curiosity)
Date: 23 Mar 94 22:41:51 -0500

Hi !!


What is WWW ?? What do you use it for ???


I did'nt think I was a 'new' user until I heard about this WWW stuff ?

Any help/info will be greatly appreciated.

Thankyou.

Gavin Lewis.

lewisg -at- aspen -dot- uml -dot- edu

GO CHIEFS HOCKEY !!!!



*What is WWW ?? What do you use it for ???

The WWW is the thing that, when you first
heard about computers, and then Internet,
and now the Information Superhighway, &
you thought it would be so cool, bit it
wasn't, it should have been.

Mostly you use it to look kewell, to
keep ahead of the Joneses, to kill time.

Hope This Helps (tm).

-Dee Seest
--
Dee Seest | President
Rte 1, Box 558 / Leander, TX / 78641 [ USPS ] | Roadkills-R-Us
d -dot- seest -at- rru -dot- com [ arpa ] | "preferred obituary here"
who!really!cares!anymore? [ uucp ] | [plot for rent]
http://hostname.pencom.com/rru.html [ WWW ] | [censored]



Previous by Author: Portable Online Documentation
Next by Author: Next meeting of the SGML Forum of NY
Previous by Thread: I've had enough of Andreas....
Next by Thread: Re: I've had enough of Andreas; how about you?


What this post helpful? Share it with friends and colleagues:


Sponsored Ads