Summaries of SGML Forum meetings
Chet Ensign <DOCCOE -at- IBIVM -dot- IBMMAIL -dot- COM>
Mon, 25 Apr 1994 11:24:45 EDT
Anatole Wilson and Debra Carnegie asked in previous messages if they
could get summaries of the next general meeting of the SGML Forum of
I write summary minutes of both the keynote speaker and vendor demos at
the general meetings and, after getting them mailed to our paid membership,
post them to comp-text-sgml, the SGML news group on the internet.
If there is an interest in following the topic, I'd be happy to post them
on TECHWR-L as well. As a sample, here is the latest summary -- the minutes
of our March 8th general meeting.
<TEXT OF ARTICLE>
Minutes of the General Meeting, March 8, 1994.
This article summarizes the March general meeting of the SGML Forum of New
York. The meeting was held on March 8, 1994 at the McGraw-Hill Conference
Center, McGraw-Hill Inc, New York.
Held at: McGraw-Hill, Inc.
2nd Floor Conference Center
1221 Avenue of the Americas
The meeting was called to order by Cesare Del Vaglio, President of the Forum,
at 5:40 P.M.
Chet Ensign announced that Eric Levine of Microsoft, who was scheduled to
demonstrate the new SGML add-on for Word at the June meeting, has canceled. A
replacement vendor demonstration will be announced later. Eric's demonstration
will be scheduled for later this year.
There was no Treasurers report.
Nadine Welsch, Matthew Bender & Co.
Matthew Bender is a publisher of legal information. The company publishes
legal forms and treatises in loose-leaf binders and, more recently, on CD-ROM.
For the past year and a half, they have been converting to an SGML-based
publishing system which features a central database for storage of text, a
document management front-end for the users that mediates access to the data,
and an SGML editor for their editorial staff. Oracle is the database product,
and Document Navigator and WriterStation, both from Datalogics, provide the
Matthew Bender & Co. recognized several years back that their old methods of
document preparation were unproductive. The files for most documents were
stored off-line. They were only brought onto the company's computer system
when it was time to prepare a revised edition. Consequently, updates or
additions to a document had to be kept someplace separate until it was time to
work on the book. The files were stored with typesetting codes in place. When
it was time to update a book, those codes were stripped out so that editors
could work on the document. The files were then recoded after the editors were
finished. This post-editing typesetting process added time to the schedule.
Extra coding also had to be added for the CD-ROM product.
MB turned to SGML to eliminate the duplicated work and shorten their
production schedules. In the short run, MB expects the new system to reduce
the redundant effort expended on page layout by automating composition. By
bringing all documents into the database and eliminating duplicate text, the
system will improve the accessibility to the information by both their
internal professional editors and by outside authors.
In the long term, MB expects the system to streamline the procedures for
producing their CD-ROM product. They also expect SGML to enable them to offer,
for the first time, customized documents to individual customers (an option
that would be prohibitively expensive under the old system) and online
How Did They Get Started?
Nadine said that they got a team of users together -- composition people,
editors, reviewers, etc. -- to analyze treatise materials. Datalogics
contributed to the effort as well. She said; "I recommend analyzing as much
data as possible. Take into account that you'll want to make later
modifications to the DTD -- for example, promoting or demoting elements in the
hierarchy -- as easy as possible." From the analysis, they wrote a DTD that
could be used for their documents. They then went to work on a migration plan.
"You need a migration plan," Nadine said; "because you are going to be
scheduling and processing a lot of material." In MB's case, migration was
complicated by the fact that they typically publish 800,000 pages a year, with
many books being updated 3 or 4 times. The big challenge was going to be
scheduling the conversion of their documents into SGML in the brief window of
opportunity between the end of one publication cycle and the beginning of the
"One of the decisions you have to make when you go to SGML is how you are
going to handle the conversion," Nadine said. "You can do it yourself in-
house, send it out to a service bureau, or do both." At MB, she explained,
they did both. Simple documents were converted in-house. So were extremely
complex documents where the conversion had to be supervised by someone with
knowledge of the contents, and documents that started another editing cycle
too quickly to allow the time for outside conversion were done in-house.
But for the larger volume of documents, MB contracted a service bureau to do
the conversion. They chose Data Conversion Laboratory. (Readers note: DCL will
be presenting at our June meeting.) "Their proposal specifically addressed
some of our major concerns, especially the handling of figures and tables,
which occur quite often in our kind of documents."
Keeping It Under Control
Nadine said that it was important for them to appoint someone to establish and
oversee editorial guidelines. "We were taking documents that formerly had no
editing guidelines and putting them into a very structured kind of system. We
still had people working on typewriters, writing markup instructions on pieces
of paper. So we felt it was important to appoint one person up front to make
the calls on how things should be tagged." She also felt that a key to making
the conversions go smoothly was composing the marked up text so that the
editorial staff could approve something that looked familiar.
Nadine said that it is critical that you set up a very detailed tracking
system. In their situation, they were handling very large documents. A given
document would actually be handled in stages; at any point in the conversion
cycle, parts of it might be in preparation for delivery to the lab, at the
lab, or back in on MB's system. It was very important that a tracking system
be in place that could indicate at every point in the cycle where each piece
of the document was located.
"Parse twice," was another key piece of advice. Nadine said that every
document is parsed two times, the second time being just before it is loaded
into the document database. Although the documents ought to be reliable after
being parsed the first time, she said that there have been enough cases where
something must have changed that she now believes that parsing twice is an
important QA check on your documents.
So Where Are They Now?
As of now, MB has converted almost 1 million pages to their SGML system, and
they have published tens of thousands of pages on their new editing system.
Because the system has proved successful, their are extending it by designing
additional DTDs especially for front matter and for primary source materials.
In answer to a question about what originally motivated MB to adopt SGML,
Nadine said that tighter and tighter publication schedules and obviously
redundant work really made it obvious that SGML was needed.
In answer to a question about the time frame of these activities, she said
that the original RFP went out 3 years ago. The pilot project started a year
and a half ago. By the end of this year, most documents the company publishes
should be in the new system.
A question was posed regarding how they handle loose-leaf pages. Nadine
answered that "every text fragment has a unique identifier. The database can
track data that has been modified by tracking the text fragment identifier."
Data is chunked at the chapter level in the database, because they could not
realistically manage it at a lower level than that. However, since the system
can track changes to every fragment in the database, they can automatically
generate page changes.
Another question was posed regarding how well this worked with outside
authors. Nadine said that they are working with WordPerfect because that is
the most popular package among their outside authors. DCL developed a routine
for them that keeps the SGML tags in the documents as comments. She said that
means that they can give their files to these authors, tell them to turn the
comments off, and let them do whatever they want. They don't rely on reparsing
the files they get back. Instead, they use file comparing programs to identify
the differences between the file that went out and the file that came back and
a member of their editorial staff makes the changes. They have only been doing
this for 3 or 4 months, but it seems to be working well so far.
Nadine was asked to comment on the cost of in-house conversion compared to
outside conversion. She said that they are really about the same. But an
outside service vendor is more economical when there is a large volume of
stuff to convert or when the source is really well tagged so that the
conversions will flow smoothly.
Asked about the payback, Nadine answered that it is too early to tell. There
is a large support cost right now that they expect to see drop off over time.
As editors, composition people, etc. become more familiar with the system, the
costs of helping them will decline. But, in the meantime, she said, "the
quality of life is so much better now that nobody would ever want to change
Richard Pasewark, Datalogics, WriterStation
After a brief intermission, Rich Pasewark of Datalogics talked about their
products WriterStation and DL Composer, a FOSI-based batch composition system.
Due to a last-minute scheduling conflict, the screen projector we borrow was
unavailable and Rich was unable to demonstrate the products on screen.
Datalogics is now a Frame Technology company. It is the high-end integration
arm of Frame. Datalogics offers a number of SGML-based products, led by
WriterStation, an SGML editor. Rich said that WriterStation, which has long
been available on DOS and OS/2, is now being migrated to Windows and
The development of WriterStation began in 1986. When it was first released on
PCs running DOS, it was the first formatting SGML text editor for that
In the late 1980's, Datalogics helped develop TextWrite for IBM, an SGML
editor that used the WriterStation engine. IBM didn't have much success with
the product, and Datalogics took it back in 1991. Since then, they have been
further developing and refining the product, adding features designed to make
it function effectively as the front-end in a client/server approach to
Rich showed a list of the features of WriterStation and pointed to several in
particular that distinguish it from others: 16 files can be open
simultaneously and open files can be shown in multiple views; WriterStation
can launch graphics programs to display images, the menus can be customized
and hot-keys can be set up to facilitate the writers' work.
WriterStation also has features to support its use in a client/server
publishing environment. It has an Application Programming Interface (API) and
Software Development Kit (SDK) that enables WriterStation to drive -- or be
driven by -- other programs. WriterStation Tools, the application development
product for the WriterStation editor, enables the developer to associate
formatting characteristics with elements in the DTD. "You actually create a
screen-format for your application." Rich said.
WriterStation can use Dynamic Data Exchange (DDE) links to exchange commands
and data from other programs. Using DDE, for example, WriterStation can launch
a graphics program in order to edit a diagram that goes with the document.
WriterStation enables users to edit document fragments, instead of whole SGML
document instances, so that writers can work on components of documents while
still maintaining valid SGML structure.
WriterStation is available on the OS/2 and DOS operating systems. Datalogics
has announced that it will be available on Windows and Windows/NT later this
year. WriterStation Tools is also available on these PC platforms. The
WriterStation kernel is the same on all these platforms and the program is
"essentially the same product." Only the interface is different.
Rich then told the meeting about DL Composer, Datalogics' FOSI-driven batch
composition engine. DL Composer is unique, he said, because "it is the only
layout engine that combines parsing with composition." Here he echoed Nadine's
advice -- "you can never parse too often!"
DL Composer takes as input your DTD, a document instance, and your FOSI.
(FOSI, Rich explained, stands for "Formatting Output Specification Instance."
A FOSI is basically an SGML instance based on the CALS Output Specification
DTD. The FOSI associates typographic characteristics with elements in the
document.) Composer generates PostScript for output.
Rich noted that the FOSI-based approach is not suitable for every document.
"FOSIs are good for highly predictable data" he said, "but they are not yet
strong in financial documents where you have lots of footnoting and alignment
FOSI-limitations are largely a result of the technique's having been
originally developed for military technical manuals. Its capabilities are
limited when it comes to more sophisticated layouts. Rich said that Datalogics
does intend to support DSSSL "whenever there is something to support."
Datalogics is a member of the committee working on DSSSL.
Rich noted that one wrinkle Datalogics has added is the ability to support
content-tagging in tables. This means that the developer of an application is
not locked into using row/cell DTD table models.
DL Composer is very fast. It can be as fast as 1 page/second on a Sun, HP, Dec
Ultrix or IBM RS6000 machine. It can automatically resolve cross-references in
a document and generate tables of contents and indexes. DL Composer can also
support change-page applications, such as those needed in the legal publishing
Rich invited everyone to contact him for pricing and other information. He can
be reached at Datalogics at (312) 266-3202. His email address is
rap -at- dlogics -dot- com -dot-
Next Forum Meetingg
The next general meeting of the Forum is scheduled for April 5th.
Minutes submitted April 4th, 1993.
Information Builders, Inc. 212-736-6250 X4349
New York, NY 10001
internet: doccoe -at- ibivm -dot- ibmmail -dot- com
ibmmail: USUBUVMV -at- IBMMAIL
Search our Technical Writing Archives & Magazine