Re: I really need some help or advice.

Subject: Re: I really need some help or advice.
From: Kris Olberg <kjolberg -at- IX -dot- NETCOM -dot- COM>
Date: Sun, 15 Nov 1998 18:47:40 -0600

-----Original Message-----
From: TIEJUN YANG <yang_tiejun -at- HOTMAIL -dot- COM>

> I'm a graduate student in Computer Science Department, Utah State
>University. I am working on a project -- automatic grading system, which
>is to find the format differences in two word files, like different font
>type, font size and so on.

Do you mean "word files" as in "Microsoft Word files"? If so, ask Microsoft.
The native storage format for Microsoft Word varies depending on version.
Versions 6 and lower use a proprietary native storage format that varies for
each version. Version 7 uses RTF (rich text format), the specs for which you
could obtain somewhere on the web, possibly from Microsoft. I'm not familiar
with the native storage formats for Version 8 and higher.

> I think there are some "control characters" to control these
>formats.These characters are all unprintable characters and I need to
>know what are their exact meaning or what they control.

The notion of using "control characters" is fairly old and mainly associated
with older printer data streams such as the 1403 line printer data stream
still generated by many COBOL and other older mainframe technologies. In
this context, "control characters," whether expressed using ASCII or EBCDIC
or some other mapping, are embedded in the printer data stream to signal the
printer to do things like form feed, line feed, print bold. One common
mechanism is to use the Esc (escape) control character (ASCII 27, isn't it?)
to first signal the printer to stop and read the following additional
control characters that immediately follow for further formatting
instructions. For the purposes of illustration, the following sample printer
data stream would print the "Mr. Jones" in bold (if the control characters
to bold and unbold were ASCII 201 and 202):

Dear <ASCII 27><ASCII 201>Mr. Jones:<ASCII 27><ASCII 202><CR><LF>I am
writing to inform you ...

In the actual printer data stream, <ASCII 27> etc. would be replaced by the
actual character represented by decimal 27 in the ASCII mapping scheme.
Similarly for ASCII 201 and 202. Since these characters may be unprintable,
I can't show them here.

The Word native storage formats, even though they use the same concepts
described above, are highly complex because of the advanced functionality
provided by printers commonly used today. In the old days, printers that
cost a fortune could tab, form feed, line feed, bold, italic, use Courier
font instead of Gothic, maybe print red or some other color (with the right
ribbon) but not much else. Today you can spend a pittance on a printer that
can print nearly any color at any point on the paper (sometimes called
all-points addressable, or APA printing), allowing for virtually unlimited
application of formatting.

kolberg -at- healtheon -dot- com
kris -at- olberg -dot- com

From ??? -at- ??? Sun Jan 00 00:00:00 0000=

Previous by Author: Re: Software to tally number of windows in an application
Next by Author: Re: WHAT DO YOU SUGGEST-TW Course
Previous by Thread: I really need some help or advice.
Next by Thread: Ethics of multiple clients in same biz?

What this post helpful? Share it with friends and colleagues:

Sponsored Ads