Re: Space between sentences--Eureka! the source of confusion

Subject: Re: Space between sentences--Eureka! the source of confusion
From: Mark Baker <mbaker -at- OMNIMARK -dot- COM>
Date: Wed, 4 Aug 1999 13:20:44 -0400

I'm just back from vacation, which I mention as a paper thin defense for
taking yet another whack at this dead horse.

What we need to understand about the question of spacing after a sentence is
this: it is a question of markup. We are all familiar with the idea of
markup now, thanks to HTML and XML, but not all markup comes in the form of
pointy brackets. There are several non-XML, non-HTML conventions for
paragraph and sentence delimiters in ASCII text streams. (But for some real
fun, look up the XML and SGML rules on white space handling).

Examples of paragraph delimiting markup include first line indention and two
line feeds in a row. Sentence delimiting markup conventions consist of two
principle sequences: ASCII 46, 32, 32 (period, space, space) and ASCII 46,
32 (period, space). Alternatives are ASCII 46, 13, 10 (period, carriage
return, line feed) and those involving punctuation such as ASCII 46, 41, 32
(period, close parenthesis, space) and ASCII 46, 41, 32, 32 (period, close
parenthesis, space, space).

When you type at a computer keyboard you create a stream of ASCII characters
which is captured by the editor you are using and either recorded verbatim
or processed in some way to create an onscreen display. More sophisticated
software will tend to have separate rendering routines for screen and paper
display.

Whatever sentence delimiting markup you use, that markup instructs whichever
rendering engine you are using to do the appropriate layout of the sentences
involved. The problem is that available rendering engines vary from really
smart to really dumb. A really smart rendering engine should accept either
one space or two, recognize them as end of sentence markup, and do the
correct (and identical) rendering in both cases.

Unfortunately most rendering engines are not that smart. (Many have not
fully shaken off their origins as typewriter emulators.) So, as with any
other kind or markup, you have to choose the markup that your rendering
engine can handle. This is no different from choosing the flavor of HTML
that your user's browser supports. Incidentally, browsers are generally
smart about end of sentence markup and will give you the same amount of
spacing whether your file includes one space, two spaces, or fifty spaces.

If you have only one rendering engine to worry about, simply pick the markup
that that engine accepts. If you have more than one, either find out which
convention works best for all of them, or pick one arbitrarily and convert
back and forth between the two conventions as required. If you can't decide
which one to pick, choose one space. Most dumb rendering engines work better
with one space than two.

Please do not image that there is a universal right answer to this or any
other markup question. In markup it doesn't matter what the convention is,
only that the encoding and decoding agents agree.

For those who need to convert, here is a simple OmniMark program that will
toggle a file back and forth between the two formats. (As with any
conversion program, check your data to make sure it isn't changing something
it shouldn't.) OmniMark is available for free download at
http://www.omnimark.com.

macro sentence-end-character is ["!?."] macro-end
macro parenthesis-end-character is ["%"%')"] macro-end

process
submit #main-input

find ( sentence-end-character
parenthesis-end-character*) => chars
space{2}

output chars || " " ;one space

find ( sentence-end-character
parenthesis-end-character*) => chars
space

output chars || " " ;two spaces

As it stands this program toggles one space to two and two spaces to one. On
a file with mixed spacing it will simply toggle each case, leaving you with
mixed spacing again. To make a pure two to one converter, delete the second
find rule. To make a pure one to two converter, delete the first find rule.

Note that this program doesn't work on binary formats such as Word or Frame
files.

---
Mark Baker
Senior Technical Communicator
OmniMark Technologies Corporation
1400 Blair Place
Gloucester, Ontario
Canada, K1J 9B8
Phone: 613-745-4242
Fax: 613-745-5560
Email mbaker -at- omnimark -dot- com
Web: http://www.omnimark.com

From ??? -at- ??? Sun Jan 00 00:00:00 0000=


Previous by Author: Re: Name for navigation?
Next by Author: Re: Portfolios and writing samples...a little more
Previous by Thread: [Fwd: JOBS (2):Sr. TW - Sarasota FL]
Next by Thread: Re: Whoopee! No Credit Card! Me Neither! HUNGH ?!


What this post helpful? Share it with friends and colleagues:


Sponsored Ads